kvm master clock usually has a different frequency than the kernel boot
clock. This is not a problem until the master clock is updated;
update uses the current kernel boot clock to compute new kvm clock,
which erases any kvm clock cycles that might have built up due to
frequency difference over a long period.
KVM_SET_CLOCK is one of places where we can safely update master clock
as the guest-visible clock is going to be shifted anyway.
The problem with current code is that it updates the kvm master clock
after updating the offset. If the master clock was enabled before
calling KVM_SET_CLOCK, then it might have built up a significant delta
from kernel boot clock.
In the worst case, the time set by userspace would be shifted by so much
that it couldn't have been set at any point during KVM_SET_CLOCK.
To fix this, move kvm_gen_update_masterclock() before computing
kvmclock_offset, which means that the master clock and kernel boot clock
will be sufficiently close together.
Another solution would be to replace get_kvmclock_ns() with
"ktime_get_boot_ns() + ka->kvmclock_offset", which is marginally more
accurate, but would break symmetry with KVM_GET_CLOCK.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Factoring ctime into the nfsv4 change attribute gives us better
properties than just i_version alone.
Eventually we'll likely also expose this (as opposed to raw i_version)
to userspace, at which point we'll want to move it to a common helper,
called from either userspace or individual filesystems. For now, nfsd
is the only user.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Clean up: No need to save the I/O direction. The functions that
release svc_rdma_chunk_ctxt already know what direction to use.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Clean up: Registration mode details are now handled by the rdma_rw
API, and thus can be removed from svcrdma.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
There's no longer a need to compare each SGE's lkey with the PD's
local_dma_lkey. Now that FRWR is gone, all DMA mappings are for
pages that were registered with this key.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Clean up: Now that the svc_rdma_recvfrom path uses the rdma_rw API,
the details of Read sink buffer registration are dealt with by the
kernel's RDMA core. This cache is no longer used, and can be
removed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Clean up:
The generic RDMA R/W API conversion of svc_rdma_recvfrom replaced
the Register, Read, and Invalidate completion handlers. Remove the
old ones, which are no longer used.
These handlers shared some helper code with svc_rdma_wc_send. Fold
the wc_common helper back into the one remaining completion handler.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When an RPC-over-RDMA request is received, the Receive buffer
contains a Transport Header possibly followed by an RPC message.
Even though rq_arg.head[0] (as passed to NFSD) does not contain the
Transport Header header, currently rq_arg.len includes the size of
the Transport Header.
That violates the intent of the xdr_buf API contract. .buflen should
include everything, but .len should be exactly the length of the RPC
message in the buffer.
The rq_arg fields are summed together at the end of
svc_rdma_recvfrom to obtain the correct return value. rq_arg.len
really ought to contain the correct number of bytes already, but it
currently doesn't due to the above misbehavior.
Let's instead ensure that .buflen includes the length of the
transport header, and that .len is always equal to head.iov_len +
.page_len + tail.iov_len .
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The current svcrdma recvfrom code path has a lot of detail about
registration mode and the type of port (iWARP, IB, etc).
Instead, use the RDMA core's generic R/W API. This shares code with
other RDMA-enabled ULPs that manages the gory details of buffer
registration and the posting of RDMA Read Work Requests.
Since the Read list marshaling code is being replaced, I took the
opportunity to replace C structure-based XDR encoding code with more
portable code that uses pointer arithmetic.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
svc_rdma_rw.c already contains helpers for the sendto path.
Introduce helpers for the recvfrom path.
The plan is to replace the local NFSD bespoke code that constructs
and posts RDMA Read Work Requests with calls to the rdma_rw API.
This shares code with other RDMA-enabled ULPs that manages the gory
details of buffer registration and posting Work Requests.
This new code also puts all RDMA_NOMSG-specific logic in one place.
Lastly, the use of rqstp->rq_arg.pages is deprecated in favor of
using rqstp->rq_pages directly, for clarity.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
svcrdma needs 259 pages allocated to receive 1MB NFSv4.0 WRITE requests:
- 1 page for the transport header and head iovec
- 256 pages for the data payload
- 1 page for the trailing GETATTR request (since NFSD XDR decoding
does not look for a tail iovec, the GETATTR is stuck at the end
of the rqstp->rq_arg.pages list)
- 1 page for building the reply xdr_buf
But RPCSVC_MAXPAGES is already 259 (on x86_64). The problem is that
svc_alloc_arg never allocates that many pages. To address this:
1. The final element of rq_pages always points to NULL. To
accommodate up to 259 pages in rq_pages, add an extra element
to rq_pages for the array termination sentinel.
2. Adjust the calculation of "pages" to match how RPCSVC_MAXPAGES
is calculated, so it can go up to 259. Bruce noted that the
calculation assumes sv_max_mesg is a multiple of PAGE_SIZE,
which might not always be true. I didn't change this assumption.
3. Change the loop boundaries to allow 259 pages to be allocated.
Additional clean-up: WARN_ON_ONCE adds an extra conditional branch,
which is basically never taken. And there's no need to dump the
stack here because svc_alloc_arg has only one caller.
Keeping that NULL "array termination sentinel"; there doesn't appear to
be any code that depends on it, only code in nfsd_splice_actor() which
needs the 259th element to be initialized to *something*. So it's
possible we could just keep the array at 259 elements and drop that
final NULL, but we're being conservative for now.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
pci_scan_root_bus_bridge() returns zero for success, or a negative errno.
A typo in ae13cb9b19 ("PCI: rockchip: Convert PCI scan API to
pci_scan_root_bus_bridge()") treated zero as a failure.
Fix the typo.
Fixes: ae13cb9b19 ("PCI: rockchip: Convert PCI scan API to pci_scan_root_bus_bridge()")
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Few GLK platform variants report a different vendor id. Add it.
Also add the missing check for GLK in is_haswell_plus().
Signed-off-by: Subhransu S. Prusty <subhransu.s.prusty@intel.com>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull i2c updates from Wolfram Sang:
"This pull request contains:
- i2c core reorganization. One source file became too monolithic. It
is now split up, yet we still have the same named object as the
final output. This should ease maintenance.
- new drivers: ZTE ZX2967 family, ASPEED 24XX/25XX
- designware driver gained slave mode support
- xgene-slimpro driver gained ACPI support
- bigger overhaul for pca-platform driver
- the algo-bit module now supports messages with enforced STOP
- slightly bigger than usual set of driver updates and improvements
and with much appreciated quality assurance from Andy Shevchenko"
* 'i2c/for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (51 commits)
i2c: Provide a stub for i2c_detect_slave_mode()
i2c: designware: Let slave adapter support be optional
i2c: designware: Make HW init functions static
i2c: designware: fix spelling mistakes
i2c: pca-platform: propagate error from i2c_pca_add_numbered_bus
i2c: pca-platform: correctly set algo_data.reset_chip
i2c: acpi: Do not create i2c-clients for LNXVIDEO ACPI devices
i2c: designware: enable SLAVE in platform module
i2c: designware: add SLAVE mode functions
i2c: zx2967: drop COMPILE_TEST dependency
i2c: zx2967: always use the same device when printing errors
i2c: pca-platform: use dev_warn/dev_info instead of printk
i2c: pca-platform: use device managed allocations
i2c: pca-platform: add devicetree awareness
i2c: pca-platform: switch to struct gpio_desc
dt-bindings: add bindings for i2c-pca-platform
i2c: cadance: fix ctrl/addr reg write order
i2c: zx2967: add i2c controller driver for ZTE's zx2967 family
dt: bindings: add documentation for zx2967 family i2c controller
i2c: algo-bit: add support for I2C_M_STOP
...
Pull IOMMU updates from Joerg Roedel:
"This update comes with:
- Support for lockless operation in the ARM io-pgtable code.
This is an important step to solve the scalability problems in the
common dma-iommu code for ARM
- Some Errata workarounds for ARM SMMU implemenations
- Rewrite of the deferred IO/TLB flush code in the AMD IOMMU driver.
The code suffered from very high flush rates, with the new
implementation the flush rate is down to ~1% of what it was before
- Support for amd_iommu=off when booting with kexec.
The problem here was that the IOMMU driver bailed out early without
disabling the iommu hardware, if it was enabled in the old kernel
- The Rockchip IOMMU driver is now available on ARM64
- Align the return value of the iommu_ops->device_group call-backs to
not miss error values
- Preempt-disable optimizations in the Intel VT-d and common IOVA
code to help Linux-RT
- Various other small cleanups and fixes"
* tag 'iommu-updates-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (60 commits)
iommu/vt-d: Constify intel_dma_ops
iommu: Warn once when device_group callback returns NULL
iommu/omap: Return ERR_PTR in device_group call-back
iommu: Return ERR_PTR() values from device_group call-backs
iommu/s390: Use iommu_group_get_for_dev() in s390_iommu_add_device()
iommu/vt-d: Don't disable preemption while accessing deferred_flush()
iommu/iova: Don't disable preempt around this_cpu_ptr()
iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #126
iommu/arm-smmu-v3: Enable ACPI based HiSilicon CMD_PREFETCH quirk(erratum 161010701)
iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74
ACPI/IORT: Fixup SMMUv3 resource size for Cavium ThunderX2 SMMUv3 model
iommu/arm-smmu-v3, acpi: Add temporary Cavium SMMU-V3 IORT model number definitions
iommu/io-pgtable-arm: Use dma_wmb() instead of wmb() when publishing table
iommu/io-pgtable: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST with LPAE
iommu/arm-smmu-v3: Remove io-pgtable spinlock
iommu/arm-smmu: Remove io-pgtable spinlock
iommu/io-pgtable-arm-v7s: Support lockless operation
iommu/io-pgtable-arm: Support lockless operation
iommu/io-pgtable: Introduce explicit coherency
iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap
...
Inconsistencies result from shadowing only accesses to the full
64-bits of a 64-bit VMCS field, but not shadowing accesses to the high
32-bits of the field. The "high" part of a 64-bit field should be
shadowed whenever the full 64-bit field is shadowed.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Allow the L1 guest to specify the last page of addressable guest
physical memory for an L2 MSR permission bitmap. Also remove the
vmcs12_read_any() check that should never fail.
Fixes: 3af18d9c5f ("KVM: nVMX: Prepare for using hardware MSR bitmap")
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
According to the SDM, if the "use I/O bitmaps" VM-execution control is
1, bits 11:0 of each I/O-bitmap address must be 0. Neither address
should set any bits beyond the processor's physical-address width.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The VMCS launch state is not set to "launched" unless the VMLAUNCH
actually succeeds. VMLAUNCH failure includes VM-exits with bit 31 set.
Note that this change does not address the general problem that a
failure to launch/resume vmcs02 (i.e. vmx->fail) is not handled
correctly.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull overlayfs updates from Miklos Szeredi:
"This work from Amir introduces the inodes index feature, which
provides:
- hardlinks are not broken on copy up
- infrastructure for overlayfs NFS export
This also fixes constant st_ino for samefs case for lower hardlinks"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (33 commits)
ovl: mark parent impure and restore timestamp on ovl_link_up()
ovl: document copying layers restrictions with inodes index
ovl: cleanup orphan index entries
ovl: persistent overlay inode nlink for indexed inodes
ovl: implement index dir copy up
ovl: move copy up lock out
ovl: rearrange copy up
ovl: add flag for upper in ovl_entry
ovl: use struct copy_up_ctx as function argument
ovl: base tmpfile in workdir too
ovl: factor out ovl_copy_up_inode() helper
ovl: extract helper to get temp file in copy up
ovl: defer upper dir lock to tempfile link
ovl: hash overlay non-dir inodes by copy up origin
ovl: cleanup bad and stale index entries on mount
ovl: lookup index entry for copy up origin
ovl: verify index dir matches upper dir
ovl: verify upper root dir matches lower root dir
ovl: introduce the inodes index dir feature
ovl: generalize ovl_create_workdir()
...
The vdisplay variable is now only accessed inside of an #ifdef, producing
a harmless warning:
drivers/video/fbdev/aty/atyfb_base.c: In function 'aty_var_to_crtc':
drivers/video/fbdev/aty/atyfb_base.c:805:19: error: unused variable 'vdisplay' [-Werror=unused-variable]
This moves the declaration into the ifdef as well.
Fixes: dd7d958ae9 ("video: fbdev: aty: remove useless variable assignments in aty_var_to_crtc()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: "Gustavo A. R. Silva" <garsilva@embeddedor.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
This bioset is used for allocating bios with nr_iovecs > 0 so this flag
must be set.
Fixes: 011067b056 ("blk: replace bioset_create_nobvec() with a flags arg to bioset_create()")
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
The lower level nl80211 code in cfg80211 ensures that "len" is between
25 and NL80211_ATTR_FRAME (2304). We subtract DOT11_MGMT_HDR_LEN (24) from
"len" so thats's max of 2280. However, the action_frame->data[] buffer is
only BRCMF_FIL_ACTION_FRAME_SIZE (1800) bytes long so this memcpy() can
overflow.
memcpy(action_frame->data, &buf[DOT11_MGMT_HDR_LEN],
le16_to_cpu(action_frame->len));
Cc: stable@vger.kernel.org # 3.9.x
Fixes: 18e2f61db3 ("brcmfmac: P2P action frame tx.")
Reported-by: "freenerguo(郭大兴)" <freenerguo@tencent.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When hns port type is not debug mode, netif_tx_disable is called
when there is a tx timeout, which requires system reboot to return
to normal state. This patch fix this problem by resetting the net
dev.
Fixes: b5996f11ea ("net: add Hisilicon Network Subsystem basic ethernet support")
Signed-off-by: Lin Yun Sheng <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We accidentally free a NULL pointer and leak the pointer we want to
free. Also you can tell from the label name what was intended. :)
Fixes: abfcdc1de9 ("nfp: add a stats handler for flower offloads")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
====================
mlxsw: spectrum: Various fixes
First patch adds a missing rollback in error path. Second patch prevents
a use-after-free during IPv4 route replace. Last two patches fix warnings
from static checkers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We can't rely on kzalloc() always succeeding, so check its return value.
Suppresses the following smatch error:
mlxsw_sp_switchdev_event() error: potential null dereference
'switchdev_work->fdb_info.addr'. (kzalloc returns
null)
Fixes: af06137892 ("mlxsw: spectrum_switchdev: Add support for learning FDB through notification")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 10e23eb299 ("mlxsw: spectrum: Remove support for bypass bridge
port attributes/vlan set") removed statements that used 'bridge_vlan',
but didn't remove the variable itself resulting in the following warning
with W=1:
warning: variable ‘bridge_vlan’ set but not used
[-Wunused-but-set-variable]
Remove the variable and suppress the warning.
Fixes: 10e23eb299 ("mlxsw: spectrum: Remove support for bypass bridge port attributes/vlan set")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While working on IPv6 route replace I realized we can have a
use-after-free in IPv4 in case the replaced route is offloaded and the
only one using its FIB info.
The problem is that fib_table_insert() drops the reference on the FIB
info of the replaced routes which is eventually freed via call_rcu().
Since the driver doesn't hold a reference on this FIB info it can cause
a use-after-free when it tries to clear the RTNH_F_OFFLOAD flag stored
in fi->fib_flags.
After running the following commands in a loop for enough time with a
KASAN enabled kernel I finally got the below trace.
$ ip route add 192.168.50.0/24 via 192.168.200.1 dev enp3s0np3
$ ip route replace 192.168.50.0/24 dev enp3s0np5
$ ip route del 192.168.50.0/24 dev enp3s0np5
BUG: KASAN: use-after-free in mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
Read of size 4 at addr ffff8803717d9820 by task kworker/u4:2/55
[...]
? mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
? mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
? mlxsw_sp_router_neighs_update_work+0x1cd0/0x1ce0 [mlxsw_spectrum]
? mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
__asan_load4+0x61/0x80
mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
mlxsw_sp_fib_entry_offload_refresh+0xb6/0x370 [mlxsw_spectrum]
mlxsw_sp_router_fib_event_work+0xd1c/0x2780 [mlxsw_spectrum]
[...]
Freed by task 5131:
save_stack_trace+0x16/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x70/0xc0
kfree+0x144/0x570
free_fib_info_rcu+0x2e7/0x410
rcu_process_callbacks+0x4f8/0xe30
__do_softirq+0x1d3/0x9e2
Fix this by taking a reference on the FIB info when creating the nexthop
group it represents and drop it when the group is destroyed.
Fixes: 599cf8f95f ("mlxsw: spectrum_router: Add support for route replace")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch the error path of mlxsw_sp_nexthop_init() is symmetric
with mlxsw_sp_nexthop_fini(). Noticed during code review.
Fixes: a8c9701427 ("mlxsw: spectrum_router: Refactor nexthop init routine")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
64bit DMA only supported on sun4v equipped with ATU IOMMU HW.
'Commit b02c2b0bfd ("sparc: remove arch specific dma_supported
implementations")' introduced a code that incorrectly allow
dma_supported() to succeed for 64bit dma mask even if system doesn't
have ATU IOMMU. This results into panic.
Fix it.
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Which is the case in S/390, where symbols were not being resolved
because machine__get_kernel_start was only setting machine->kernel_start
when the just successfully loaded kernel symtab had its map->start set
to !0, when it was left at (1ULL << 63) assuming a partitioning of the
address space for user/kernel, which is not the case in S/390 nor in
Sparc.
So just check if map__load() was successfull and set
machine->kernel_start to zero, fixing kernel symbol resolution on S/390.
Test performed by Thomas:
----
I like this patch. I have done a new build and removed all my debug output to start
from scratch. Without your patch I get this:
# Samples: 4 of event 'cpu-clock'
# Event count (approx.): 1000000
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................ ........................
75.00% 0.00% true [unknown] [k] 0x00000000004bedda
|
---0x4bedda
|
|--50.00%--0x42693a
| |
| --25.00%--0x2a72e0
| 0x2af0ca
| 0x3d1003fe4c0
|
--25.00%--0x4272bc
0x26fa84
and with your patch (I just rebuilt the perf tool, nothing else and used the same
perf.data file as input):
# Samples: 4 of event 'cpu-clock'
# Event count (approx.): 1000000
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... .......................... ..................................
75.00% 0.00% true [kernel.vmlinux] [k] pgm_check_handler
|
---pgm_check_handler
do_dat_exception
handle_mm_fault
__handle_mm_fault
filemap_map_pages
|
|--25.00%--rcu_read_lock_held
| rcu_lockdep_current_cpu_online
| 0x3d1003ff4c0
|
--25.00%--lock_release
Looks good to me....
----
Reported-and-Tested-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com>
Link: http://lkml.kernel.org/n/tip-dk0n1uzmbe0tbthrpfqlx6bz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There are mq devices (eg., virtio-blk, nbd and loopback) which don't
invoke blk_mq_run_hw_queues() after the completion of a request.
If bfq is enabled on these devices and the slice_idle attribute or
strict_guarantees attribute is set as zero, it is possible that
after a request completion the remaining requests of busy bfq queue
will stalled in the bfq schedule until a new request arrives.
To fix the scheduler latency problem, we need to check whether or not
all issued requests have completed and dispatch more requests to driver
if there is no request in driver.
The problem can be reproduced by running the following script
on a virtio-blk device with nr_hw_queues as 1:
#!/bin/sh
dev=vdb
# mount point for dev
mp=/tmp/mnt
cd $mp
job=strict.job
cat <<EOF > $job
[global]
direct=1
bs=4k
size=256M
rw=write
ioengine=libaio
iodepth=128
runtime=5
time_based
[1]
filename=1.data
[2]
new_group
filename=2.data
EOF
echo bfq > /sys/block/$dev/queue/scheduler
echo 1 > /sys/block/$dev/queue/iosched/strict_guarantees
fio $job
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The start time of eligible entity should be less than or equal to
the current virtual time, and the entity in idle tree has a finish
time being greater than the current virtual time.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
My static checker complains that if "func" is NULL then "clear_filter"
is uninitialized. This seems like it could be true, although it's
possible something subtle is happening that I haven't seen.
kernel/trace/ftrace.c:3844 match_records()
error: uninitialized symbol 'clear_filter'.
Link: http://lkml.kernel.org/r/20170712073556.h6tkpjcdzjaozozs@mwanda
Cc: stable@vger.kernel.org
Fixes: f0a3b154bd ("ftrace: Clarify code for mod command")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
With a shared policy in place, when one of the CPUs in the policy is
hotplugged out and then brought back online, sugov_stop() and
sugov_start() are called in order.
sugov_stop() removes utilization hooks for each CPU in the policy and
does nothing else in the for_each_cpu() loop. sugov_start() on the
other hand iterates through the CPUs in the policy and re-initializes
the per-cpu structure _and_ adds the utilization hook. This implies
that the scheduler is allowed to invoke a CPU's utilization update
hook when the rest of the per-cpu structures have yet to be
re-inited.
Apart from some strange values in tracepoints this doesn't cause a
problem, but if we do end up accessing a pointer from the per-cpu
sugov_cpu structure somewhere in the sugov_update_shared() path,
we will likely see crashes since the memset for another CPU in the
policy is free to race with sugov_update_shared from the CPU that is
ready to go. So let's fix this now to first init all per-cpu
structures, and then add the per-cpu utilization update hooks all at
once.
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In the current code, if the user accidentally writes a bogus command to
this sysfs file, then we set the latency tolerance to an uninitialized
variable.
Fixes: 2d984ad132 (PM / QoS: Introcuce latency tolerance device PM QoS type)
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When the minimum performance limit percentage is set to the power-up
default, it is possible that minimum performance ratio is off by one.
In the set_policy() callback the minimum ratio is calculated by
applying global.min_perf_pct to turbo_ratio and rounding up, but the
power-up default global.min_perf_pct is already rounded up to the
next percent in min_perf_pct_min(). That results in two round up
operations, so for the default min_perf_pct one of them is not
required.
It is better to remove rounding up in min_perf_pct_min() as this
matches the displayed min_perf_pct prior to commit c5a2ee7dde
(cpufreq: intel_pstate: Active mode P-state limits rework) in 4.12.
For example on a platform with max turbo ratio of 37 and minimum
ratio of 10, the min_perf_pct resulted in 28 with the above commit.
Before this commit it was 27 and it will be the same after this
change.
Fixes: 1a4fe38add (cpufreq: intel_pstate: Remove max/min fractions to limit performance)
Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Although it's not documented anywhere, there is an expectation that
atomic64_inc_not_zero() returns a result which fits in an int. This is
the behaviour implemented on all arches except powerpc.
This has caused at least one bug in practice, in the percpu-refcount
code, where the long result from our atomic64_inc_not_zero() was
truncated to an int leading to lost references and stuck systems. That
was worked around in that code in commit 966d2b04e0 ("percpu-refcount:
fix reference leak during percpu-atomic transition").
To the best of my grepping abilities there are no other callers
in-tree which truncate the value, but we should fix it anyway. Because
the breakage is subtle and potentially very harmful I'm also tagging
it for stable.
Code generation is largely unaffected because in most cases the
callers are just using the result for a test anyway. In particular the
case of fget() that was mentioned in commit a6cf7ed511
("powerpc/atomic: Implement atomic*_inc_not_zero") generates exactly
the same code.
Fixes: a6cf7ed511 ("powerpc/atomic: Implement atomic*_inc_not_zero")
Cc: stable@vger.kernel.org # v3.4
Noticed-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
fwnode_call_int_op() isn't suitable for calling ops that return bool
since it effectively causes the result returned to the user to be
true when an op hasn't been defined or the fwnode is NULL.
Address this by introducing fwnode_call_bool_op() for calling ops
that return bool.
Fixes: 3708184afc "device property: Move FW type specific functionality to FW specific files"
Fixes: 2294b3af05 "device property: Introduce fwnode_device_is_available()"
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The GPD win BIOS dated 20170320 has disabled the accelerometer, the
drivers sometimes cause crashes under Windows and this is how the
manufacturer has solved this :|
I see no other way to keep the accelerometer working under Windows then
adding it to the always_present_ids array.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The _STA method of the Venue 11 Pro 7130 touchscreen has this ugliness:
Method (_STA, 0, NotSerialized) // _STA: Status
{
If ((SDS1 & One) == One)
{
If (RST1 == Zero)
{
Return (0x0F)
}
ElseIf (RST2 == Zero)
{
RST2 = One
TMRV = Timer
}
Else
{
Local0 = ((Timer - TMRV) / 0x2710)
If (Local0 > TMRI)
{
RST2 = Zero
RST1 = Zero
}
}
}
Else
{
Return (Zero)
}
}
Whereby RST1 gets set by _SB.PCI0.GFX0.LCD.LCD1._ON, this means that
after RST1 has been set first _STA must be called to set TIMER and
then after enough time has elapsed _STA must be called twice more, once
to clear RST1 and once to finally return 0xf before the touchscreen will
show up. Which is just crazy.
This commit adds an always_present_ids entry for the SYNA7500 touchscreen
ACPI node, together with a DMI match for the Venue 11 Pro 7130, fixing the
touchscreen not working on this device.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On some x86 systems the DSDT hides APCI devices to work around Windows
driver bugs. On one such system the device is even hidden until a certain
time after _SB.PCI0.GFX0.LCD.LCD1._ON gets called has passed *and*
_STA has been called at least 3 times since. TL;DR: it is a mess.
Until now the always_present_id matching was used to force status
for a whole class of devices, e.g. always enable PWM1 on CHerry Trail
devices.
This commit extends the always_present_id matching code to optionally
also check for a DMI match so that we can also add system specific
quirks to the always_present_id array.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On Lenovo ThinkPad X1 Carbon - the 5th Generation, enabling an earlier
EC event freezing timing causes acpitz-virtual-0 to report a stuck
48C temparature. And with EC firmware revisioned as 1.14, without
reverting back to old EC event freezing timing, the fan still blows
up after a system resume.
This reverts the culprit change so that the regression can be fixed
without upgrading the EC firmware.
Fixes: d30283057e (ACPI / EC: Enable event freeze mode to improve event handling)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=191181#c168
Tested-by: Damjan Georgievski <gdamjan@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>