Francois reported that VMware balloon gets stuck after a balloon reset,
when the VMCI doorbell is removed. A similar error can occur when the
balloon driver is removed with the following splat:
[ 1088.622000] INFO: task modprobe:3565 blocked for more than 120 seconds.
[ 1088.622035] Tainted: G W 5.2.0 #4
[ 1088.622087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1088.622205] modprobe D 0 3565 1450 0x00000000
[ 1088.622210] Call Trace:
[ 1088.622246] __schedule+0x2a8/0x690
[ 1088.622248] schedule+0x2d/0x90
[ 1088.622250] schedule_timeout+0x1d3/0x2f0
[ 1088.622252] wait_for_completion+0xba/0x140
[ 1088.622320] ? wake_up_q+0x80/0x80
[ 1088.622370] vmci_resource_remove+0xb9/0xc0 [vmw_vmci]
[ 1088.622373] vmci_doorbell_destroy+0x9e/0xd0 [vmw_vmci]
[ 1088.622379] vmballoon_vmci_cleanup+0x6e/0xf0 [vmw_balloon]
[ 1088.622381] vmballoon_exit+0x18/0xcc8 [vmw_balloon]
[ 1088.622394] __x64_sys_delete_module+0x146/0x280
[ 1088.622408] do_syscall_64+0x5a/0x130
[ 1088.622410] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1088.622415] RIP: 0033:0x7f54f62791b7
[ 1088.622421] Code: Bad RIP value.
[ 1088.622421] RSP: 002b:00007fff2a949008 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 1088.622426] RAX: ffffffffffffffda RBX: 000055dff8b55d00 RCX: 00007f54f62791b7
[ 1088.622426] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055dff8b55d68
[ 1088.622427] RBP: 000055dff8b55d00 R08: 00007fff2a947fb1 R09: 0000000000000000
[ 1088.622427] R10: 00007f54f62f5cc0 R11: 0000000000000206 R12: 000055dff8b55d68
[ 1088.622428] R13: 0000000000000001 R14: 000055dff8b55d68 R15: 00007fff2a94a3f0
The cause for the bug is that when the "delayed" doorbell is invoked, it
takes a reference on the doorbell entry and schedules work that is
supposed to run the appropriate code and drop the doorbell entry
reference. The code ignores the fact that if the work is already queued,
it will not be scheduled to run one more time. As a result one of the
references would not be dropped. When the code waits for the reference
to get to zero, during balloon reset or module removal, it gets stuck.
Fix it. Drop the reference if schedule_work() indicates that the work is
already queued.
Note that this bug got more apparent (or apparent at all) due to
commit ce664331b2 ("vmw_balloon: VMCI_DOORBELL_SET does not check status").
Fixes: 83e2ec765b ("VMCI: doorbell implementation.")
Reported-by: Francois Rigault <rigault.francois@gmail.com>
Cc: Jorgen Hansen <jhansen@vmware.com>
Cc: Adit Ranadive <aditr@vmware.com>
Cc: Alexios Zavras <alexios.zavras@intel.com>
Cc: Vishnu DASA <vdasa@vmware.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Vishnu Dasa <vdasa@vmware.com>
Link: https://lore.kernel.org/r/20190820202638.49003-1-namit@vmware.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kees writes:
Updates to LKDTM for -next
- split WARNING into two tests: with message and without
- add prototype-granularity forward CFI test
* tag 'lkdtm-next' of https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lkdtm: Split WARNING into separate tests
lkdtm: Add Control Flow Integrity test
These structs have holes in them so we end up disclosing a few bytes of
uninitialized stack data.
drivers/misc/xilinx_sdfec.c:305 xsdfec_get_status() warn: check that 'status' doesn't leak information (struct has a hole after 'activity')
drivers/misc/xilinx_sdfec.c:449 xsdfec_get_turbo() warn: check that 'turbo_params' doesn't leak information (struct has a hole after 'scale')
We need to zero out the holes with memset().
Fixes: 6bd6a690c2 ("misc: xilinx_sdfec: Add stats & status ioctls")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dragan Cvetic <dragan.cvetic@xilinx.com>
Reviewed-by: Michal Simek <michal.simek@xilinx.com>
Link: https://lore.kernel.org/r/20190821070606.GA26957@mwanda
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
From rdma.git
Jason Gunthorpe says:
====================
This is a collection of general cleanups for ODP to clarify some of the
flows around umem creation and use of the interval tree.
====================
The branch is based on v5.3-rc5 due to dependencies, and is being taken
into hmm.git due to dependencies in the next patches.
* odp_fixes:
RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
RDMA/mlx5: Use ib_umem_start instead of umem.address
RDMA/core: Make invalidate_range a device operation
RDMA/odp: Use kvcalloc for the dma_list and page_list
RDMA/odp: Check for overflow when computing the umem_odp end
RDMA/odp: Provide ib_umem_odp_release() to undo the allocs
RDMA/odp: Split creating a umem_odp from ib_umem_get
RDMA/odp: Make the three ways to create a umem_odp clear
RMDA/odp: Consolidate umem_odp initialization
RDMA/odp: Make it clearer when a umem is an implicit ODP umem
RDMA/odp: Iterate over the whole rbtree directly
RDMA/odp: Use the common interval tree library instead of generic
RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There are three paths through the kernel code exception logging:
- BUG (no configurable printk message)
- WARN_ON (no configurable printk message)
- WARN (configurable printk message)
LKDTM was not testing WARN_ON(). This is needed to evaluate the placement
of the "cut here" line, which needs special handling in each of the
three exceptions (and between architectures that implement instruction
exceptions to implement the code exceptions).
Signed-off-by: Kees Cook <keescook@chromium.org>
The only thing remaining of the machvecs is a few checks if we are
running on an SGI UV system. Replace those with the existing
is_uv_system() check that has been rewritten to simply check the
OEM ID directly.
That leaves us with a generic kernel that is as fast as the previous
DIG/ZX1/UV kernels, but can support all hardware. Support for UV
and the HP SBA IOMMU is now optional based on new config options.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lkml.kernel.org/r/20190813072514.23299-27-hch@lst.de
Signed-off-by: Tony Luck <tony.luck@intel.com>
Note this also marks xp broken on ia64 now, as the UV support, which
was disable in generic kernels before actually never compiled due to
undefined uv_gpa_to_soc_phys_ram and uv_gpa_in_mmr_space symbols since
at least commit c2c9f11574 ("x86: uv: update XPC to handle updated
BIOS interface").
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lkml.kernel.org/r/20190813072514.23299-11-hch@lst.de
Signed-off-by: Tony Luck <tony.luck@intel.com>
GRU is already using almost the same algorithm as get/put, it even
helpfully has a 10 year old comment to make this algorithm common, which
is finally happening.
There are a few differences and fixes from this conversion:
- GRU used rcu not srcu to read the hlist
- Unclear how the locking worked to prevent gru_register_mmu_notifier()
from running concurrently with gru_drop_mmu_notifier() - this version is
safe
- GRU had a release function which only set a variable without any locking
that skiped the synchronize_srcu during unregister which looks racey,
but this makes it reliable via the integrated call_srcu().
- It is unclear if the mmap_sem is actually held when
__mmu_notifier_register() was called, lockdep will now warn if this is
wrong
Link: https://lore.kernel.org/r/20190806231548.25242-5-jgg@ziepe.ca
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dimitri Sivanich <sivanich@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
xpc_send_activate_IRQ_uv is called from
<-xpc_send_activate_IRQ_part_uv
<-xpc_indicate_partition_disengaged_uv
(xpc_arch_ops.indicate_partition_disengaged)
<-xpc_die_deactivate
<-xpc_system_die
xpc_system_die is registered by atomic_notifier_chain_register,
which indicates xpc_system_die may be called in atomic context.
So the kmalloc in xpc_send_activate_IRQ_uv may be in atomic context.
Use GFP_ATOMIC instead of GFP_KERNEL in kmalloc.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Link: https://lore.kernel.org/r/20190805125625.24963-1-huangfq.daxian@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Support monitoring and detecting the SD-FEC error events
through IRQ and poll file operation.
The SD-FEC device can detect one-error or multi-error events.
An error triggers an interrupt which creates and run the ONE_SHOT
IRQ thread.
The ONE_SHOT IRQ thread detects type of error and pass that
information to the poll function.
The file_operation callback poll(), collects the events and
updates the statistics accordingly.
The function poll blocks() on waiting queue which can be
unblocked by ONE_SHOT IRQ handling thread.
Support SD-FEC interrupt set ioctl callback.
The SD-FEC can detect two type of errors: coding errors (ECC) and
a data interface errors (TLAST).
The errors are events which can trigger an IRQ if enabled.
The driver can monitor and detect these errors through IRQ.
Also the driver updates the statistical data.
Tested-by: Dragan Cvetic <dragan.cvetic@xilinx.com>
Signed-off-by: Derek Kiernan <derek.kiernan@xilinx.com>
Signed-off-by: Dragan Cvetic <dragan.cvetic@xilinx.com>
Link: https://lore.kernel.org/r/1564216438-322406-6-git-send-email-dragan.cvetic@xilinx.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This adds a simple test for forward CFI (indirect function calls) with
function prototype granularity (as implemented by Clang's CFI).
Signed-off-by: Kees Cook <keescook@chromium.org>
When unmasking IRQs inside the ASIC, the driver passes an array of all the
IRQ to unmask. The ASIC's CPU is working in LE so when running in a BE
host, the driver needs to do the proper endianness swapping when preparing
this array.
In addition, this patch also fixes the endianness of a couple of kernel log
debug messages that print values of packets
Signed-off-by: Ben Segal <bpsegal20@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The PQs of internal H/W queues (QMANs) can be located in different memory
areas for different ASICs. Therefore, when writing PQEs, we need to use
the correct function according to the location of the PQ. e.g. if the PQ
is located in the device's memory (SRAM or DRAM), we need to use
memcpy_toio() so it would work in architectures that have separate
address ranges for IO memory.
This patch makes the code that writes the PQE to be ASIC-specific so we
can handle this properly per ASIC.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Tested-by: Ben Segal <bpsegal20@gmail.com>
This patch fix the CQ irq handler to work in hosts with BE architecture.
It adds the correct endian-swapping macros around the relevant memory
accesses.
Signed-off-by: Ben Segal <bpsegal20@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Packets that arrive from the user and need to be parsed by the driver are
assumed to be in LE format.
This patch fix all the places where the code handles these packets and use
the correct endianness macros to handle them, as the driver handles the
packets in CPU format (LE or BE depending on the arch).
Signed-off-by: Ben Segal <bpsegal20@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The patch fix the DRAM usage accounting by adding a missing update of
the DRAM memory consumption, when a context is being torn down without an
organized release of the allocated memory.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
In case kernel context init fails during device initialization, both
hl_ctx_put() and kfree() are called, ending with a double free of the
kernel context.
Calling kfree() is needed only when a failure happens between the
allocation of the kernel context and its initialization, so move it to
there and remove it from the error flow.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Pull char/misc driver fixes Greg KH:
"Here are some small char/misc driver fixes for 5.3-rc4.
Two of these are for the habanalabs driver for issues found when
running on a big-endian system (are they still alive?) The others are
tiny fixes reported by people, and a MAINTAINERS update about the
location of the fpga development tree.
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
coresight: Fix DEBUG_LOCKS_WARN_ON for uninitialized attribute
MAINTAINERS: Move linux-fpga tree to new location
nvmem: Use the same permissions for eeprom as for nvmem
habanalabs: fix host memory polling in BE architecture
habanalabs: fix F/W download in BE architecture