The current implementation of create CQ requires contiguous
memory, such requirement is problematic once the memory is
fragmented or the system is low in memory, it causes for
failures in dma_zalloc_coherent().
This patch implements new scheme of fragmented CQ to overcome
this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
to allocate fragmented buffers, rather than contiguous ones.
Base the Completion Queues (CQs) on this new fragmented buffer.
It fixes following crashes:
kworker/29:0: page allocation failure: order:6, mode:0x80d0
CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
[<>] dump_stack+0x19/0x1b
[<>] warn_alloc_failed+0x110/0x180
[<>] __alloc_pages_slowpath+0x6b7/0x725
[<>] __alloc_pages_nodemask+0x405/0x420
[<>] dma_generic_alloc_coherent+0x8f/0x140
[<>] x86_swiotlb_alloc_coherent+0x21/0x50
[<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
[<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
[<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
[<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
[<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
[<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
EQ structure and API is private to mlx5_core driver only, external
drivers should not have access or the means to manipulate EQ objects.
Remove redundant exports and move API functions out of the linux/mlx5
include directory into the driver's mlx5_core.h private include file.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Since CQ tree is now per EQ, CQ completion and event forwarding became
specific implementation of EQ logic, this patch moves that logic to eq.c
and makes those functions static.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Now as the CQ table is per EQ, add an API to hold/put CQ to be used from
eq.c in downstream patch.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Before this patch the driver had one CQ database protected via one
spinlock, this spinlock is meant to synchronize between CQ
adding/removing and CQ IRQ interrupt handling.
On a system with large number of CPUs and on a work load that requires
lots of interrupts, this global spinlock becomes a very nasty hotspot
and introduces a contention between the active cores, which will
significantly hurt performance and becomes a bottleneck that prevents
seamless cpu scaling.
To solve this we simply move the CQ database and its spinlock to be per
EQ (IRQ), thus per core.
Tested with:
system: 2 sockets, 14 cores per socket, hyperthreading, 2x14x2=56 cores
netperf command: ./super_netperf 200 -P 0 -t TCP_RR -H <server> -l 30 -- -r 300,300 -o -s 1M,1M -S 1M,1M
WITHOUT THIS PATCH:
Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
Average: all 4.32 0.00 36.15 0.09 0.00 34.02 0.00 0.00 0.00 25.41
Samples: 2M of event 'cycles:pp', Event count (approx.): 1554616897271
Overhead Command Shared Object Symbol
+ 14.28% swapper [kernel.vmlinux] [k] intel_idle
+ 12.25% swapper [kernel.vmlinux] [k] queued_spin_lock_slowpath
+ 10.29% netserver [kernel.vmlinux] [k] queued_spin_lock_slowpath
+ 1.32% netserver [kernel.vmlinux] [k] mlx5e_xmit
WITH THIS PATCH:
Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
Average: all 4.27 0.00 34.31 0.01 0.00 18.71 0.00 0.00 0.00 42.69
Samples: 2M of event 'cycles:pp', Event count (approx.): 1498132937483
Overhead Command Shared Object Symbol
+ 23.33% swapper [kernel.vmlinux] [k] intel_idle
+ 1.69% netserver [kernel.vmlinux] [k] mlx5e_xmit
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Pull x86 fixes from Ingo Molnar:
"Misc fixes all across the map:
- /proc/kcore vsyscall related fixes
- LTO fix
- build warning fix
- CPU hotplug fix
- Kconfig NR_CPUS cleanups
- cpu_has() cleanups/robustification
- .gitignore fix
- memory-failure unmapping fix
- UV platform fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages
x86/error_inject: Make just_return_func() globally visible
x86/platform/UV: Fix GAM Range Table entries less than 1GB
x86/build: Add arch/x86/tools/insn_decoder_test to .gitignore
x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
x86/mm/kcore: Add vsyscall page to /proc/kcore conditionally
vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page
x86/Kconfig: Further simplify the NR_CPUS config
x86/Kconfig: Simplify NR_CPUS config
x86/MCE: Fix build warning introduced by "x86: do not use print_symbol()"
x86/cpufeature: Update _static_cpu_has() to use all named variables
x86/cpufeature: Reindent _static_cpu_has()
Pull x86 PTI and Spectre related fixes and updates from Ingo Molnar:
"Here's the latest set of Spectre and PTI related fixes and updates:
Spectre:
- Add entry code register clearing to reduce the Spectre attack
surface
- Update the Spectre microcode blacklist
- Inline the KVM Spectre helpers to get close to v4.14 performance
again.
- Fix indirect_branch_prediction_barrier()
- Fix/improve Spectre related kernel messages
- Fix array_index_nospec_mask() asm constraint
- KVM: fix two MSR handling bugs
PTI:
- Fix a paranoid entry PTI CR3 handling bug
- Fix comments
objtool:
- Fix paranoid_entry() frame pointer warning
- Annotate WARN()-related UD2 as reachable
- Various fixes
- Add Add Peter Zijlstra as objtool co-maintainer
Misc:
- Various x86 entry code self-test fixes
- Improve/simplify entry code stack frame generation and handling
after recent heavy-handed PTI and Spectre changes. (There's two
more WIP improvements expected here.)
- Type fix for cache entries
There's also some low risk non-fix changes I've included in this
branch to reduce backporting conflicts:
- rename a confusing x86_cpu field name
- de-obfuscate the naming of single-TLB flushing primitives"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
x86/entry/64: Fix CR3 restore in paranoid_exit()
x86/cpu: Change type of x86_cache_size variable to unsigned int
x86/spectre: Fix an error message
x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
selftests/x86/mpx: Fix incorrect bounds with old _sigfault
x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]()
x86/speculation: Add <asm/msr-index.h> dependency
nospec: Move array_index_nospec() parameter checking into separate macro
x86/speculation: Fix up array_index_nospec_mask() asm constraint
x86/debug: Use UD2 for WARN()
x86/debug, objtool: Annotate WARN()-related UD2 as reachable
objtool: Fix segfault in ignore_unreachable_insn()
selftests/x86: Disable tests requiring 32-bit support on pure 64-bit systems
selftests/x86: Do not rely on "int $0x80" in single_step_syscall.c
selftests/x86: Do not rely on "int $0x80" in test_mremap_vdso.c
selftests/x86: Fix build bug caused by the 5lvl test which has been moved to the VM directory
selftests/x86/pkeys: Remove unused functions
selftests/x86: Clean up and document sscanf() usage
selftests/x86: Fix vDSO selftest segfault for vsyscall=none
x86/entry/64: Remove the unused 'icebp' macro
...
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2018-02-14
This patch series enables the new mqprio hardware offload mechanism
creating traffic classes on VFs for XL710 devices. The parameters
needed to configure these traffic classes/queue channels are provides
by the user via the tc tool. A maximum of four traffic classes can be
created on each VF. This patch series also enables application of cloud
filters to each of these traffic classes. The cloud filters are applied
using the tc-flower classifier.
Example:
1. tc qdisc add dev vf0 root mqprio num_tc 4 map 0 0 0 0 1 2 2 3\
queues 2@0 2@2 1@4 1@5 hw 1 mode channel
2. tc qdisc add dev vf0 ingress
3. ethtool -K vf0 hw-tc-offload on
4. ip link set eth0 vf 0 spoofchk off
5. tc filter add dev vf0 protocol ip parent ffff: prio 1 flower dst_ip\
192.168.3.5/32 ip_proto udp dst_port 25 skip_sw hw_tc 2
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When NET_PTP_CLASSIFY is disabled, a stub function is required in
order that the drivers compile.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The omap_mcbsp_dev_attr data was used to supply instance-specific
data for legacy non-DT devices. The legacy McBSP device support
including the usage of the hwmod class revision data has been
dropped in commit 48f6693790 ("ARM: OMAP2+: Remove unused legacy
code for McBSP") and this data is therefore no longer needed.
So, cleanup the structure and all the associated data in various
hwmod data files.
Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The omap2_spi_dev_attr data was used to supply instance-specific
data for legacy non-DT devices. The SPI legacy device support
including the usage of the hwmod class revision data has been
dropped in commit 6f3ab009a1 ("ARM: OMAP2+: Remove unused legacy
code for device init") and this data is therefore no longer needed.
So, cleanup the structure and all the associated data in various
hwmod data files.
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The omap_gpio_dev_attr data was used to supply instance-specific
data for legacy non-DT devices. The GPIO legacy device support has
been cleaned up in commit 14944934f8 ("ARM: OMAP2+: Remove legacy
gpio code") a while ago and this data is therefore no longer needed.
So, cleanup the structure and all the associated data in various
hwmod data files.
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
This patch adds infrastructure to send virtchnl messages to the
PF to configure filters on the VF. The patch adds a struct
called virtchnl_filter which contains information about the fields
in the user-specified tc filter.
Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch defines new structs in support of the virtchannel message
that the VF sends to the PF to create a queue channel specified by the
user via tc tool.
Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This converts the bit-banged GPIO SPI driver to looking up and
using GPIO descriptors to get a handle on GPIO lines for SCK,
MOSI, MISO and all CS lines.
All existing board files are converted in one go to keep it all
consistent. With these conversions I rarely find any interrim
steps that makes any sense.
Device tree probing and GPIO handling should work like before
also after this patch.
For board files, we stop using controller data to pass the GPIO
line for chip select, instead we pass this as a GPIO descriptor
lookup like everything else.
In some s3c24xx machines the names of the SPI devices were set to
"spi-gpio" rather than "spi_gpio" which can never have worked, I
fixed it working (I guess) as part of this patch set. Sometimes
I wonder how this code got upstream in the first place, it
obviously is not tested.
mach-s3c64xx/mach-smartq.c has the same problem and additionally
defines the *same* GPIO line for MOSI and MISO which is not going
to be accepted by gpiolib. As the lines were number 1,2,2 I assumed
it was a typo and use lines 1,2,3. A comment gives awat that line 0
is chip select though no actual SPI device is provided for the LCD
supposed to be on this bit-banged SPI bus. I left it intact instead
of just deleting the bus though.
Kill off board file code that try to initialize the SPI lines
to the same values that they will later be set by the spi_gpio
driver anyways. Given the huge number of weird things in these
board files I do not think this code is very tested or put in
with much afterthought anyways.
In order to assert that we do not get performance regressions on
this crucial bing-banged driver, a ran a script like this dumping the
Ilitek ILI9322 regmap 10000 times (it has no caching obviously) on
an otherwise idle system in two iterations before and after the
patches:
#!/bin/sh
for run in `seq 10000`
do
cat /debug/regmap/spi0.0/registers > /dev/null
done
Before the patch:
time test.sh
real 3m 41.03s
user 0m 29.41s
sys 3m 7.22s
time test.sh
real 3m 44.24s
user 0m 32.31s
sys 3m 7.60s
After the patch:
time test.sh
real 3m 41.32s
user 0m 28.92s
sys 3m 8.08s
time test.sh
real 3m 39.92s
user 0m 30.20s
sys 3m 5.56s
So any performance differences seems to be in the error margin.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Olof Johansson <olof@lixom.net>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
On some platforms msi parent address regions have to be excluded from
normal IOVA allocation in that they are detected and decoded in a HW
specific way by system components and so they cannot be considered normal
IOVA address space.
Add a helper function that retrieves ITS address regions - the msi
parent - through IORT device <-> ITS mappings and reserves it so that
these regions will not be translated by IOMMU and will be excluded from
IOVA allocations. The function checks for the smmu model number and
only applies the msi reservation if the platform requires it.
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
[For the ITS part]
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If is sufficient with a forward declaration of struct xdp_rxq_info in
linux/filter.h, which avoids including net/xdp.h. This was originally
suggested by John Fastabend during the review phase, but wasn't
included in the final patchset revision. Thus, this followup.
Suggested-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, iommu_unmap, iommu_unmap_fast and iommu_map_sg return
size_t. However, some of the return values are error codes (< 0),
which can be misinterpreted as large size. Therefore, returning size 0
instead to signify failure to map/unmap.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
So one could decode them without opening the specification.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Currently, the mutex is mostly used to protect pernet operations
list. It orders setup_net() and cleanup_net() with parallel
{un,}register_pernet_operations() calls, so ->exit{,batch} methods
of the same pernet operations are executed for a dying net, as
were used to call ->init methods, even after the net namespace
is unlinked from net_namespace_list in cleanup_net().
But there are several problems with scalability. The first one
is that more than one net can't be created or destroyed
at the same moment on the node. For big machines with many cpus
running many containers it's very sensitive.
The second one is that it's need to synchronize_rcu() after net
is removed from net_namespace_list():
Destroy net_ns:
cleanup_net()
mutex_lock(&net_mutex)
list_del_rcu(&net->list)
synchronize_rcu() <--- Sleep there for ages
list_for_each_entry_reverse(ops, &pernet_list, list)
ops_exit_list(ops, &net_exit_list)
list_for_each_entry_reverse(ops, &pernet_list, list)
ops_free_list(ops, &net_exit_list)
mutex_unlock(&net_mutex)
This primitive is not fast, especially on the systems with many processors
and/or when preemptible RCU is enabled in config. So, all the time, while
cleanup_net() is waiting for RCU grace period, creation of new net namespaces
is not possible, the tasks, who makes it, are sleeping on the same mutex:
Create net_ns:
copy_net_ns()
mutex_lock_killable(&net_mutex) <--- Sleep there for ages
I observed 20-30 seconds hangs of "unshare -n" on ordinary 8-cpu laptop
with preemptible RCU enabled after CRIU tests round is finished.
The solution is to convert net_mutex to the rw_semaphore and add fine grain
locks to really small number of pernet_operations, what really need them.
Then, pernet_operations::init/::exit methods, modifying the net-related data,
will require down_read() locking only, while down_write() will be used
for changing pernet_list (i.e., when modules are being loaded and unloaded).
This gives signify performance increase, after all patch set is applied,
like you may see here:
%for i in {1..10000}; do unshare -n bash -c exit; done
*before*
real 1m40,377s
user 0m9,672s
sys 0m19,928s
*after*
real 0m17,007s
user 0m5,311s
sys 0m11,779
(5.8 times faster)
This patch starts replacing net_mutex to net_sem. It adds rw_semaphore,
describes the variables it protects, and makes to use, where appropriate.
net_mutex is still present, and next patches will kick it out step-by-step.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the following commit:
ce0fa3e56a ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
... we added code to memory_failure() to unmap the page from the
kernel 1:1 virtual address space to avoid speculative access to the
page logging additional errors.
But memory_failure() may not always succeed in taking the page offline,
especially if the page belongs to the kernel. This can happen if
there are too many corrected errors on a page and either mcelog(8)
or drivers/ras/cec.c asks to take a page offline.
Since we remove the 1:1 mapping early in memory_failure(), we can
end up with the page unmapped, but still in use. On the next access
the kernel crashes :-(
There are also various debug paths that call memory_failure() to simulate
occurrence of an error. Since there is no actual error in memory, we
don't need to map out the page for those cases.
Revert most of the previous attempt and keep the solution local to
arch/x86/kernel/cpu/mcheck/mce.c. Unmap the page only when:
1) there is a real error
2) memory_failure() succeeds.
All of this only applies to 64-bit systems. 32-bit kernel doesn't map
all of memory into kernel space. It isn't worth adding the code to unmap
the piece that is mapped because nobody would run a 32-bit kernel on a
machine that has recoverable machine checks.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert (Persistent Memory) <elliott@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org #v4.14
Fixes: ce0fa3e56a ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
remoteproc instances can be stopped either by invoking shutdown or by an
attempt to recover from a crash. For some subdev types it's expected to
clean up gracefully during a shutdown, but are unable to do so during a
crash - so pass this information to the subdev remove functions.
Acked-By: Chris Lew <clew@codeaurora.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Changes since v1:
Added changes in these files:
drivers/infiniband/hw/usnic/usnic_transport.c
drivers/staging/lustre/lnet/lnet/lib-socket.c
drivers/target/iscsi/iscsi_target_login.c
drivers/vhost/net.c
fs/dlm/lowcomms.c
fs/ocfs2/cluster/tcp.c
security/tomoyo/network.c
Before:
All these functions either return a negative error indicator,
or store length of sockaddr into "int *socklen" parameter
and return zero on success.
"int *socklen" parameter is awkward. For example, if caller does not
care, it still needs to provide on-stack storage for the value
it does not need.
None of the many FOO_getname() functions of various protocols
ever used old value of *socklen. They always just overwrite it.
This change drops this parameter, and makes all these functions, on success,
return length of sockaddr. It's always >= 0 and can be differentiated
from an error.
Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
rpc_sockname() lost "int buflen" parameter, since its only use was
to be passed to kernel_getsockname() as &buflen and subsequently
not used in any way.
Userspace API is not changed.
text data bss dec hex filename
30108430 2633624 873672 33615726 200ef6e vmlinux.before.o
30108109 2633612 873672 33615393 200ee21 vmlinux.o
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: David S. Miller <davem@davemloft.net>
CC: linux-kernel@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-bluetooth@vger.kernel.org
CC: linux-decnet-user@lists.sourceforge.net
CC: linux-wireless@vger.kernel.org
CC: linux-rdma@vger.kernel.org
CC: linux-sctp@vger.kernel.org
CC: linux-nfs@vger.kernel.org
CC: linux-x25@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to implement support for grabbing core dumps in remoteproc it's
necessary to know the relocated base of the image, as the offsets from
the virtual memory base might not be based on the physical address.
Return the adjusted physical base address to the caller.
Acked-by: Andy Gross <andy.gross@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
The resource table is just one possible source of information that can
be extracted from the firmware file. Generalize this interface to allow
drivers to override this with parsers of other types of information.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
As the remoteproc framework restarts the remote processor after a fatal
event, it's useful to be able to acquire a coredump of the remote
processor's state, for post mortem debugging.
This patch introduces a mechanism for extracting the memory contents
after the remote has stopped and before the restart sequence has begun
in the recovery path. The remoteproc framework builds the core dump in
memory and use devcoredump to expose this to user space.
Signed-off-by: Sarangdhar Joshi <spjoshi@codeaurora.org>
[bjorn: Use vmalloc instead of composing the ELF on the fly]
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
- oversize stack frames on mn10300 in sha3-generic
- warning on old compilers in sha3-generic
- API error in sun4i_ss_prng
- potential dead-lock in sun4i_ss_prng
- null-pointer dereference in sha512-mb
- endless loop when DECO acquire fails in caam
- kernel oops when hashing empty message in talitos"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: sun4i_ss_prng - convert lock to _bh in sun4i_ss_prng_generate
crypto: sun4i_ss_prng - fix return value of sun4i_ss_prng_generate
crypto: caam - fix endless loop when DECO acquire fails
crypto: sha3-generic - Use __optimize to support old compilers
compiler-gcc.h: __nostackprotector needs gcc-4.4 and up
compiler-gcc.h: Introduce __optimize function attribute
crypto: sha3-generic - deal with oversize stack frames
crypto: talitos - fix Kernel Oops on hashing an empty file
crypto: sha512-mb - initialize pending lengths correctly
Not a real RAID level, but some HBAs support JBOD in addition to the
'classical' RAID levels.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Platforms like 96boards have a standardized connector/expansion
slot that exposes signals like GPIOs to expansion boards in an
SoC agnostic way. We'd like the DT overlays for the expansion
boards to be written once without knowledge of the SoC on the
other side of the connector. This avoids the unscalable
combinatorial explosion of a different DT overlay for each
expansion board and SoC pair.
We need a way to describe the GPIOs routed through the connector
in an SoC agnostic way. Let's introduce nexus property parsing
into the OF core to do this. This is largely based on the
interrupt nexus support we already have. This allows us to remap
a phandle list in a consumer node (e.g. reset-gpios) through a
connector in a generic way (e.g. via gpio-map). Do this in a
generic routine so that we can remap any sort of variable length
phandle list.
Taking GPIOs as an example, the connector would be a GPIO nexus,
supporting the remapping of a GPIO specifier space to multiple
GPIO providers on the SoC. DT would look as shown below, where
'soc_gpio1' and 'soc_gpio2' are inside the SoC, 'connector' is an
expansion port where boards can be plugged in, and
'expansion_device' is a device on the expansion board.
soc {
soc_gpio1: gpio-controller1 {
#gpio-cells = <2>;
};
soc_gpio2: gpio-controller2 {
#gpio-cells = <2>;
};
};
connector: connector {
#gpio-cells = <2>;
gpio-map = <0 0 &soc_gpio1 1 0>,
<1 0 &soc_gpio2 4 0>,
<2 0 &soc_gpio1 3 0>,
<3 0 &soc_gpio2 2 0>;
gpio-map-mask = <0xf 0x0>;
gpio-map-pass-thru = <0x0 0x1>
};
expansion_device {
reset-gpios = <&connector 2 GPIO_ACTIVE_LOW>;
};
The GPIO core would use of_parse_phandle_with_args_map() instead
of of_parse_phandle_with_args() and arrive at the same type of
result, a phandle and argument list. The difference is that the
phandle and arguments will be remapped through the nexus node to
the underlying SoC GPIO controller node. In the example above,
we would remap 'reset-gpios' from <&connector 2 GPIO_ACTIVE_LOW>
to <&soc_gpio1 3 GPIO_ACTIVE_LOW>.
Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Commit f859422075 (x86: PM: Make APM idle driver initialize polling
state) made apm_init() call cpuidle_poll_state_init(), but that only
is defined for CONFIG_CPU_IDLE set, so make the empty stub of it
available for CONFIG_CPU_IDLE unset too to fix the resulting build
issue.
Fixes: f859422075 (x86: PM: Make APM idle driver initialize polling state)
Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When CONFIG_DMA_ENGINE_RAID is enabled, unmap pool size can reach to
256. But in struct dmaengine_unmap_data, map_cnt is only u8, wrapping
to 0, if the unmap pool is maximally used. This triggers BUG() when
struct dmaengine_unmap_data is freed. Use u16 to fix the problem.
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Switch to use dividing to prevent integer overflow when size is too
big to calculate allocation size properly.
Reported-by: Eric Biggers <ebiggers3@gmail.com>
Fixes: 6e6e41c311 ("ptr_ring: fail early if queue occupies more than KMALLOC_MAX_SIZE")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
except, again, POLLFREE and POLL_BUSY_LOOP.
With this, we finally get to the promised end result:
- POLL{IN,OUT,...} are plain integers and *not* in __poll_t, so any
stray instances of ->poll() still using those will be caught by
sparse.
- eventpoll.c and select.c warning-free wrt __poll_t
- no more kernel-side definitions of POLL... - userland ones are
visible through the entire kernel (and used pretty much only for
mangle/demangle)
- same behavior as after the first series (i.e. sparc et.al. epoll(2)
working correctly).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:
for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done
with de-mangling cleanups yet to come.
NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.
The next patch from Al will sort out the final differences, and we
should be all done.
Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull more poll annotation updates from Al Viro:
"This is preparation to solving the problems you've mentioned in the
original poll series.
After this series, the kernel is ready for running
for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done
as a for bulk search-and-replace.
After that, the kernel is ready to apply the patch to unify
{de,}mangle_poll(), and then get rid of kernel-side POLL... uses
entirely, and we should be all done with that stuff.
Basically, that's what you suggested wrt KPOLL..., except that we can
use EPOLL... instead - they already are arch-independent (and equal to
what is currently kernel-side POLL...).
After the preparations (in this series) switch to returning EPOLL...
from ->poll() instances is completely mechanical and kernel-side
POLL... can go away. The last step (killing kernel-side POLL... and
unifying {de,}mangle_poll() has to be done after the
search-and-replace job, since we need userland-side POLL... for
unified {de,}mangle_poll(), thus the cherry-pick at the last step.
After that we will have:
- POLL{IN,OUT,...} *not* in __poll_t, so any stray instances of
->poll() still using those will be caught by sparse.
- eventpoll.c and select.c warning-free wrt __poll_t
- no more kernel-side definitions of POLL... - userland ones are
visible through the entire kernel (and used pretty much only for
mangle/demangle)
- same behavior as after the first series (i.e. sparc et.al. epoll(2)
working correctly)"
* 'work.poll2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
annotate ep_scan_ready_list()
ep_send_events_proc(): return result via esed->res
preparation to switching ->poll() to returning EPOLL...
add EPOLLNVAL, annotate EPOLL... and event_poll->event
use linux/poll.h instead of asm/poll.h
xen: fix poll misannotation
smc: missing poll annotations
Pull KVM updates from Radim Krčmář:
"ARM:
- icache invalidation optimizations, improving VM startup time
- support for forwarded level-triggered interrupts, improving
performance for timers and passthrough platform devices
- a small fix for power-management notifiers, and some cosmetic
changes
PPC:
- add MMIO emulation for vector loads and stores
- allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without
requiring the complex thread synchronization of older CPU versions
- improve the handling of escalation interrupts with the XIVE
interrupt controller
- support decrement register migration
- various cleanups and bugfixes.
s390:
- Cornelia Huck passed maintainership to Janosch Frank
- exitless interrupts for emulated devices
- cleanup of cpuflag handling
- kvm_stat counter improvements
- VSIE improvements
- mm cleanup
x86:
- hypervisor part of SEV
- UMIP, RDPID, and MSR_SMI_COUNT emulation
- paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit
- allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more
AVX512 features
- show vcpu id in its anonymous inode name
- many fixes and cleanups
- per-VCPU MSR bitmaps (already merged through x86/pti branch)
- stable KVM clock when nesting on Hyper-V (merged through
x86/hyperv)"
* tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (197 commits)
KVM: PPC: Book3S: Add MMIO emulation for VMX instructions
KVM: PPC: Book3S HV: Branch inside feature section
KVM: PPC: Book3S HV: Make HPT resizing work on POWER9
KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code
KVM: PPC: Book3S PR: Fix broken select due to misspelling
KVM: x86: don't forget vcpu_put() in kvm_arch_vcpu_ioctl_set_sregs()
KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled
KVM: PPC: Book3S HV: Drop locks before reading guest memory
kvm: x86: remove efer_reload entry in kvm_vcpu_stat
KVM: x86: AMD Processor Topology Information
x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested
kvm: embed vcpu id to dentry of vcpu anon inode
kvm: Map PFN-type memory regions as writable (if possible)
x86/kvm: Make it compile on 32bit and with HYPYERVISOR_GUEST=n
KVM: arm/arm64: Fixup userspace irqchip static key optimization
KVM: arm/arm64: Fix userspace_irqchip_in_use counting
KVM: arm/arm64: Fix incorrect timer_is_pending logic
MAINTAINERS: update KVM/s390 maintainers
MAINTAINERS: add Halil as additional vfio-ccw maintainer
MAINTAINERS: add David as a reviewer for KVM/s390
...