Configuring counters from the userspace require the kernel to handle some
logic related to performance counters. Basically, it has to find a free
slot to assign a counter, to handle extra counting modes like B4/B6 and it
must return and error when it can't configure a counter.
In my opinion, the kernel should not handle all of that logic but it
should only write the configuration sent by the userspace without
checking anything. In other words, it should overwrite the configuration
even if it's already counting and do not return any errors.
This patch allows the userspace to configure a domain instead of
separate counters. This has the advantage to move all of the logic to
the userspace.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This adds a new method NVIF_PERFCTR_V0_INIT which starts a batch of
hardware counters for sampling. This will allow the userspace to start
a monitoring session using the INIT method and to stop it with SAMPLE,
for example before and after a frame is rendered.
This commit temporarily breaks nv_perfmon but this is going to be fixed
with the upcoming patch.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This allows to query the ID, the mask and the user-readable name of
sources for each signal.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
A source (or multiplexer) is a tuple addr+mask+shift which allows to
control a block of signals. The maximum number of sources that a signal
can define is arbitrary limited to 8 and this should be large enough.
This patch allows to define multi-level of sources for a signal.
Each different sources are stored to a global list and will be exposed
to the userspace through the nvif interface in order to avoid conflicts.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This signal index must be always allowed even if it's not clearly
defined in a domain in order to monitor a counter like 0x03020100
because it's the default value of signals.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
16 bits is large enough to store the maximum number of signals available
for one domain (i.e. 256).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This will allow to configure performance counters with hardware signal
indexes instead of user-readable names in an upcoming patch.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This allows to query the number of available domains, including the
number of hardware counter and the number of signals per domain.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Since a new class has been introduced to query signals, we can now
return an error when the userspace wants to monitor unknown signals.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This commit introduces the NVIF_IOCTL_NEW_V0_PERFMON class which will be
used in order to query domains, signals and sources. This separates the
querying and the counting interface.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Tested on a few cards. Probably works quite well for most, given they should
all be GDDR3.
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This looks surprisingly similar to scripts on earlier cards as well
but they don't seem to work just yet. That... and I don't have any, which
makes it a tough job to reverse engineer.
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Some of the bits in there are similar to the bits in the gt215 rammap.
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Might need some generalisation to < GT200. For those: use at your own risk!
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
In preparation of NV50 reclocking, where there is no version
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
_of_init_opp_table_v2() isn't freeing up resources on some errors and
the error values returned are also not correct always.
This fixes following problems:
- Return -ENOENT, if no entries are found in the table.
- Use IS_ERR() to properly check return value of _find_device_opp().
- Return error value with PTR_ERR() in above case.
- Free table if _find_device_opp() fails.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This applies cleanup on pci_dn_reconfig_notifier(), no functional
changes:
* Rename variable "pci" to "pdn" to indicate its purpose clearly.
* The parent node can be released at any time. So it should be
hold with of_get_parent() before accessing it.
* The device node doesn't have to have parent node in theory.
More check on this.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit cca87d30 ("powerpc/pci: Refactor pci_dn") introduced pdn
list for SRIOV VFs. It means the pdn is be put into the child list
of its parent pdn when the pdn is created. When doing PCI hot
unplugging on pSeries, the PCI device node as well as its pdn are
released through procfs entry "powerpc/ofdt". Some one else grabs
the memory chunk of the pdn and update it accordingly. At the same
time, the pdn is still tracked in the child list of parent pdn. It
leads to corrupted child list in the parent pdn.
This fixes above issue by removing the pdn from the child list of
its parent pdn when the device node is detached from the system.
Note the pdn is free'd when the device node is released if the
device node is dynamic one. Otherwise, the device node as well
as the pdn won't be released.
Fixes: cca87d30 ("powerpc/pci: Refactor pci_dn")
Cc: stable@vger.kernel.org # 4.1+
Reported-by: Santwana Samantray <santwana.samantray@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull powerpc fixes from Michael Ellerman:
"Fix MSI/MSI-X on pseries from Guilherme"
* tag 'powerpc-4.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/PCI: Disable MSI/MSI-X interrupts at PCI probe time in OF case
PCI: Make pci_msi_setup_pci_dev() non-static for use by arch code
Pull networking fixes from David Miller:
"Some straggler bug fixes here:
1) Netlink_sendmsg() doesn't check iterator type properly in mmap
case, from Ken-ichirou MATSUZAWA.
2) Don't sleep in atomic context in bcmgenet driver, from Florian
Fainelli.
3) The pfkey_broadcast() code patch can't actually ever use anything
other than GFP_ATOMIC. And the cases that right now pass
GFP_KERNEL or similar will currently trigger an RCU splat. Just
use GFP_ATOMIC unconditionally. From David Ahern.
4) Fix FD bit timings handling in pcan_usb driver, from Marc
Kleine-Budde.
5) Cache dst leaked in ip6_gre tunnel removal, fix from Huaibin Wang.
6) Traversal into drivers/net/ethernet/renesas should be triggered by
CONFIG_NET_VENDOR_RENESAS, not a particular driver's config
option. From Kazuya Mizuguchi.
7) Fix regression in handling of igmp_join errors in vxlan, from
Marcelo Ricardo Leitner.
8) Make phy_{read,write}_mmd_indirect() properly take the mdio_lock
mutex when programming the registers. From Russell King.
9) Fix non-forced handling in u32_destroy(), from WANG Cong.
10) Test the EVENT_NO_RUNTIME_PM flag before it is cleared in
usbnet_stop(), from Eugene Shatokhin.
11) In sfc driver, don't fetch statistics firmware isn't capable of,
from Bert Kenward.
12) Verify ASCONF address parameter location in SCTP, from Xin Long"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
sctp: donot reset the overall_error_count in SHUTDOWN_RECEIVE state
sctp: asconf's process should verify address parameter is in the beginning
sfc: only use vadaptor stats if firmware is capable
net: phy: fixed: propagate fixed link values to struct
usbnet: Get EVENT_NO_RUNTIME_PM bit before it is cleared
drivers: net: xgene: fix: Oops in linkwatch_fire_event
cls_u32: complete the check for non-forced case in u32_destroy()
net: fec: use reinit_completion() in mdio accessor functions
net: phy: add locking to phy_read_mmd_indirect()/phy_write_mmd_indirect()
vxlan: re-ignore EADDRINUSE from igmp_join
net: compile renesas directory if NET_VENDOR_RENESAS is configured
ip6_gre: release cached dst on tunnel removal
phylib: Make PHYs children of their MDIO bus, not the bus' parent.
can: pcan_usb: don't provide CAN FD bittimings by non-FD adapters
net: Fix RCU splat in af_key
net: bcmgenet: fix uncleaned dma flags
net: bcmgenet: Avoid sleeping in bcmgenet_timeout
netlink: mmap: fix tx type check
Pull nvdimm fixlet from Dan Williams:
"This is a libnvdimm ABI fixup.
I pushed back on this change quite hard given the late date, that it
appears to be purely cosmetic, sysfs is not necessarily meant to be a
user friendly UI, and the kernel interprets the reversed polarity of
the ACPI_NFIT_MEM_ARMED flag correctly. When this flag is set, the
energy source of an NVDIMM is not armed and any new writes to the DIMM
may not be preserved.
However, Bob Moore warned me that it is important to get these things
named correctly wherever they appear otherwise we run the risk of a
less than cautious firmware engineer implementing the polarity the
wrong way. Once a mistake like that escapes into production platforms
the flag becomes useless and we need to move to a new bit position.
Bob has agreed to take a change through ACPICA to rename
ACPI_NFIT_MEM_ARMED to ACPI_NFIT_MEM_NOT_ARMED, and the patch below
from Toshi brings the sysfs representation of these flags in line with
their respective polarities.
Please pull for 4.2 as this is the first kernel to expose the ACPI
NFIT sysfs representation, and this is likely a kernel that firmware
developers will be using for checking out their NVDIMM enabling"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nfit: Clarify memory device state flags strings
According to the flexfiles protocol, the layoutreturn should specify an
array of errors in the following format:
struct ff_ioerr4 {
offset4 ffie_offset;
length4 ffie_length;
stateid4 ffie_stateid;
device_error4 ffie_errors<>;
};
This patch fixes up the code to ensure that our ffie_errors is indeed
encoded as an array (albeit with only a single entry).
Reported-by: Tom Haynes <thomas.haynes@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Phil Sutter says:
====================
fixup IFF_NO_QUEUE conversion
This series serves two purposes:
On one hand it fixes a quite embarrassing bug around the warning I added for
drivers still setting tx_queue_len = 0 to achieve noqueue operation. It turned
out to be quite useless as due to using alloc_netdev(), many in-kernel drivers
fell into the trap by accident, as well. Instead this place serves pretty well
as a sanitizing point to set IFF_NO_QUEUE for drivers not initializing
tx_queue_len, which in turn allows to drop all special treatment of the latter
being zero since that can not happen anymore without IFF_NO_QUEUE being set.
On the other hand, it provides a better solution for Eric Dumazet's concern
regarding how to assign noqueue to an interface which does not default to it
already. In order to make this possible, noqueue is being registered so users
can 'tc qd add dev eth0 root noqueue'. In addition, it resolves the ugly
situation of 'tc qd show' not showing noqueue. Finally, the former changes
allow for some code cleanup.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that noqueue qdisc can be attached just like any other qdisc, no
special treatment is necessary anymore when attaching it as default
qdisc.
This change has the added benefit that 'tc qdisc show' prints noqueue
instead of nothing for devices defaulting to noqueue.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
This way users can attach noqueue just like any other qdisc using tc
without having to mess with tx_queue_len first.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since alloc_netdev_mqs() sets IFF_NO_QUEUE for drivers not initializing
tx_queue_len, it is safe to assume that if tx_queue_len is zero,
dev->priv flags always contains IFF_NO_QUEUE.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Printing a warning in alloc_netdev_mqs() if tx_queue_len is zero and
IFF_NO_QUEUE not set is not appropriate since drivers may use one of the
alloc_netdev* macros instead of alloc_etherdev*, thereby not
intentionally leaving tx_queue_len uninitialized. Instead check here if
tx_queue_len is zero and set IFF_NO_QUEUE, so the value of tx_queue_len
can be ignored in net/sched_generic.c.
Fixes: 906470c ("net: warn if drivers set tx_queue_len = 0")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
The symbol '__sk_reclaim' is not present in the current tree. Apparently
'__sk_reclaim' was meant to be '__sk_mem_reclaim', so fix it with the
right symbol name for the kernel doc.
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Cc: Hideo Aoki <haoki@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f8d9605243 ("sctp: Enforce retransmission limit during shutdown")
fixed a problem with excessive retransmissions in the SHUTDOWN_PENDING by not
resetting the association overall_error_count. This allowed the association
to better enforce assoc.max_retrans limit.
However, the same issue still exists when the association is in SHUTDOWN_RECEIVED
state. In this state, HB-ACKs will continue to reset the overall_error_count
for the association would extend the lifetime of association unnecessarily.
This patch solves this by resetting the overall_error_count whenever the current
state is small then SCTP_STATE_SHUTDOWN_PENDING. As a small side-effect, we
end up also handling SCTP_STATE_SHUTDOWN_ACK_SENT and SCTP_STATE_SHUTDOWN_SENT
states, but they are not really impacted because we disable Heartbeats in those
states.
Fixes: Commit f8d9605243 ("sctp: Enforce retransmission limit during shutdown")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Client sends a SETATTR request after OPEN for updating attributes.
For create file with S_ISGID is set, the S_ISGID in SETATTR will be
ignored at nfs server as chmod of no PERMISSION.
v3, same as v2.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Create file with attributs as NFS4_CREATE_EXCLUSIVE4_1 mode
depends on suppattr_exclcreat attribut.
v3, same as v2.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
v4.1/v4.2 have define attributes at word2, nfs client also support
security label now.
v3, same as v2.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Check opened, only update it when non-NULL.
It's not needs define an unused value for the opened
when calling _nfs4_do_open.
v3, same as v2.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Set rlimit for NFS's files is useless right now.
For local process's rlimit, it should be checked by nfs client.
The same, CIFS also call inode_change_ok checking rlimit at its client
in cifs_setattr_nounix() and cifs_setattr_unix().
v3, fix bad using of error
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Given that a write-back (WB) mapping plus non-temporal stores is
expected to be the most efficient way to access PMEM, update the
definition of ARCH_HAS_PMEM_API to imply arch support for
WB-mapped-PMEM. This is needed as a pre-requisite for adding PMEM to
the direct map and mapping it with struct page.
The above clarification for X86_64 means that memcpy_to_pmem() is
permitted to use the non-temporal arch_memcpy_to_pmem() rather than
needlessly fall back to default_memcpy_to_pmem() when the pcommit
instruction is not available. When arch_memcpy_to_pmem() is not
guaranteed to flush writes out of cache, i.e. on older X86_32
implementations where non-temporal stores may just dirty cache,
ARCH_HAS_PMEM_API is simply disabled.
The default fall back for persistent memory handling remains. Namely,
map it with the WT (write-through) cache-type and hope for the best.
arch_has_pmem_api() is updated to only indicate whether the arch
provides the proper helpers to meet the minimum "writes are visible
outside the cache hierarchy after memcpy_to_pmem() + wmb_pmem()". Code
that cares whether wmb_pmem() actually flushes writes to pmem must now
call arch_has_wmb_pmem() directly.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
[hch: set ARCH_HAS_PMEM_API=n on x86_32]
Reviewed-by: Christoph Hellwig <hch@lst.de>
[toshi: x86_32 compile fixes]
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This behaves like devm_memremap except that it ensures we have page
structures available that can back the region.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[djbw: catch attempts to remap RAM, drop flags]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
While pmem is usable as a block device or via DAX mappings to userspace
there are several usage scenarios that can not target pmem due to its
lack of struct page coverage. In preparation for "hot plugging" pmem
into the vmemmap add ZONE_DEVICE as a new zone to tag these pages
separately from the ones that are subject to standard page allocations.
Importantly "device memory" can be removed at will by userspace
unbinding the driver of the device.
Having a separate zone prevents allocation and otherwise marks these
pages that are distinct from typical uniform memory. Device memory has
different lifetime and performance characteristics than RAM. However,
since we have run out of ZONES_SHIFT bits this functionality currently
depends on sacrificing ZONE_DMA.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Jerome Glisse <j.glisse@gmail.com>
[hch: various simplifications in the arch interface]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Three architectures already define these, and we'll need them genericly
soon.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
None of the implementations currently use it. The common
bdev_direct_access() entry point handles all the size checks before
calling ->direct_access().
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>