Output to the RDMA driver whether DPM mode is enabled or disabled in
the HW and if so what is the number of WIDs it supports
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter updates for your net-next
tree. A large bunch of code cleanups, simplify the conntrack extension
codebase, get rid of the fake conntrack object, speed up netns by
selective synchronize_net() calls. More specifically, they are:
1) Check for ct->status bit instead of using nfct_nat() from IPVS and
Netfilter codebase, patch from Florian Westphal.
2) Use kcalloc() wherever possible in the IPVS code, from Varsha Rao.
3) Simplify FTP IPVS helper module registration path, from Arushi Singhal.
4) Introduce nft_is_base_chain() helper function.
5) Enforce expectation limit from userspace conntrack helper,
from Gao Feng.
6) Add nf_ct_remove_expect() helper function, from Gao Feng.
7) NAT mangle helper function return boolean, from Gao Feng.
8) ctnetlink_alloc_expect() should only work for conntrack with
helpers, from Gao Feng.
9) Add nfnl_msg_type() helper function to nfnetlink to build the
netlink message type.
10) Get rid of unnecessary cast on void, from simran singhal.
11) Use seq_puts()/seq_putc() instead of seq_printf() where possible,
also from simran singhal.
12) Use list_prev_entry() from nf_tables, from simran signhal.
13) Remove unnecessary & on pointer function in the Netfilter and IPVS
code.
14) Remove obsolete comment on set of rules per CPU in ip6_tables,
no longer true. From Arushi Singhal.
15) Remove duplicated nf_conntrack_l4proto_udplite4, from Gao Feng.
16) Remove unnecessary nested rcu_read_lock() in
__nf_nat_decode_session(). Code running from hooks are already
guaranteed to run under RCU read side.
17) Remove deadcode in nf_tables_getobj(), from Aaron Conole.
18) Remove double assignment in nf_ct_l4proto_pernet_unregister_one(),
also from Aaron.
19) Get rid of unsed __ip_set_get_netlink(), from Aaron Conole.
20) Don't propagate NF_DROP error to userspace via ctnetlink in
__nf_nat_alloc_null_binding() function, from Gao Feng.
21) Revisit nf_ct_deliver_cached_events() to remove unnecessary checks,
from Gao Feng.
22) Kill the fake untracked conntrack objects, use ctinfo instead to
annotate a conntrack object is untracked, from Florian Westphal.
23) Remove nf_ct_is_untracked(), now obsolete since we have no
conntrack template anymore, from Florian.
24) Add event mask support to nft_ct, also from Florian.
25) Move nf_conn_help structure to
include/net/netfilter/nf_conntrack_helper.h.
26) Add a fixed 32 bytes scratchpad area for conntrack helpers.
Thus, we don't deal with variable conntrack extensions anymore.
Make sure userspace conntrack helper doesn't go over that size.
Remove variable size ct extension infrastructure now this code
got no more clients. From Florian Westphal.
27) Restore offset and length of nf_ct_ext structure to 8 bytes now
that wraparound is not possible any longer, also from Florian.
28) Allow to get rid of unassured flows under stress in conntrack,
this applies to DCCP, SCTP and TCP protocols, from Florian.
29) Shrink size of nf_conntrack_ecache structure, from Florian.
30) Use TCP_MAX_WSCALE instead of hardcoded 14 in TCP tracker,
from Gao Feng.
31) Register SYNPROXY hooks on demand, from Florian Westphal.
32) Use pernet hook whenever possible, instead of global hook
registration, from Florian Westphal.
33) Pass hook structure to ebt_register_table() to consolidate some
infrastructure code, from Florian Westphal.
34) Use consume_skb() and return NF_STOLEN, instead of NF_DROP in the
SYNPROXY code, to make sure device stats are not fooled, patch
from Gao Feng.
35) Remove NF_CT_EXT_F_PREALLOC this kills quite some code that we
don't need anymore if we just select a fixed size instead of
expensive runtime time calculation of this. From Florian.
36) Constify nf_ct_extend_register() and nf_ct_extend_unregister(),
from Florian.
37) Simplify nf_ct_ext_add(), this kills nf_ct_ext_create(), from
Florian.
38) Attach NAT extension on-demand from masquerade and pptp helper
path, from Florian.
39) Get rid of useless ip_vs_set_state_timeout(), from Aaron Conole.
40) Speed up netns by selective calls of synchronize_net(), from
Florian Westphal.
41) Silence stack size warning gcc in 32-bit arch in snmp helper,
from Florian.
42) Inconditionally call nf_ct_ext_destroy(), even if we have no
extensions, to deal with the NF_NAT_MANIP_SRC case. Patch from
Liping Zhang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Drivers usually have a number of restrictions for running XDP
- most common being buffer sizes, LRO and number of rings.
Even though some drivers try to be helpful and print error
messages experience shows that users don't often consult
kernel logs on netlink errors. Try to use the new extended
ack mechanism to carry the message back to user space.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
As we propagate extended ack reporting throughout various paths in
the kernel it may be that the same function is called with the
extended ack parameter passed as NULL. One place where that happens
is in drivers which have a centralized reconfiguration function
called both from ndos and from ethtool_ops. Add a new helper for
setting the error message in such conditions.
Existing helper is left as is to encourage propagating the ext act
fully wherever possible. It also makes it clear in the code which
messages may be lost due to ext ack being NULL.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When reading the ring buffer for consuming, it is optimized for splice,
where a page is taken out of the ring buffer (zero copy) and sent to the
reading consumer. When the read is finished with the page, it calls
ring_buffer_free_read_page(), which simply frees the page. The next time the
reader needs to get a page from the ring buffer, it must call
ring_buffer_alloc_read_page() which allocates and initializes a reader page
for the ring buffer to be swapped into the ring buffer for a new filled page
for the reader.
The problem is that there's no reason to actually free the page when it is
passed back to the ring buffer. It can hold it off and reuse it for the next
iteration. This completely removes the interaction with the page_alloc
mechanism.
Using the trace-cmd utility to record all events (causing trace-cmd to
require reading lots of pages from the ring buffer, and calling
ring_buffer_alloc/free_read_page() several times), and also assigning a
stack trace trigger to the mm_page_alloc event, we can see how many times
the ring_buffer_alloc_read_page() needed to allocate a page for the ring
buffer.
Before this change:
# trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
# trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
9968
After this change:
# trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
# trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
4
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Userspace prefers the driver having a status property over having to guess
itself. Specifically this will properly make the GNOME3 UI (and likely
others) properly show discharging / charging / full status, instead
of always showing discharging as status.
Note that in the case there is no charger driver supplying the max17042,
then a status of unknown will get returned. At least upower treats
this the same as not having a status attribute, so in this case nothing
changes from a userspace pov.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Some x86 machines use a max17047 fuel-gauge and x86 might be missing
platform_data if not provided by SFI.
This commit adds default platform_data as fallback option so that the
driver can work on boards where no platform_data is provided.
Since not all boards have a thermistor hooked up, set temp_min to 0 and
change the health checks from temp <= temp_min to temp < temp_min to
not trigger on such boards (where temp reads 0).
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
For NF_NAT_MANIP_SRC, we will insert the ct to the nat_bysource_table,
then remove it from the nat_bysource_table via nat_extend->destroy.
But now, the nat extension is attached on demand, so if the nat extension
is not attached, we will not be notified when the ct is destroyed, i.e.
we may fail to remove ct from the nat_bysource_table.
So just keep it simple, even if the extension is not attached, we will
still invoke the related ext->destroy. And this will also preserve the
flexibility for the future extension.
Fixes: 9a08ecfe74 ("netfilter: don't attach a nat extension by default")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Simon Horman says:
====================
Third Round of IPVS Updates for v4.12
please consider these enhancements to IPVS for v4.12.
If it is too late for v4.12 then please consider them for v4.13.
* Remove unused function
* Correct comparison of unsigned value
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nf_unregister_net_hook(s) can avoid a second call to synchronize_net,
provided there is no nfqueue active in that net namespace (which is
the common case).
This also gets rid of the extra arg to nf_queue_nf_hook_drop(), normally
this gets called during netns cleanup so no packets should be queued.
For the rare case of base chain being unregistered or module removal
while nfqueue is in use the extra hiccup due to the packet drops isn't
a big deal.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch drops support for AVR32 architecture from the Linux kernel.
The AVR32 architecture is not keeping up with the development of the
kernel, and since it shares so much of the drivers with Atmel ARM SoC,
it is starting to hinder these drivers to develop swiftly.
Also, all AVR32 AP7 SoC processors are end of lifed from Atmel (now
Microchip).
Finally, the GCC toolchain is stuck at version 4.2.x, and has not
received any patches since the last release from Atmel;
4.2.4-atmel.1.1.3.avr32linux.1. When building kernel v4.10, this
toolchain is no longer able to properly link the network stack.
Haavard and I have came to the conclusion that we feel keeping AVR32 on
life support offers more obstacles for Atmel ARMs, than it gives joy to
AVR32 users. I also suspect there are very few AVR32 users left today,
if anybody at all.
Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Signed-off-by: Håvard Skinnemoen <hskinnemoen@gmail.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
The x86 conversion to the generic GUP code included a small change which causes
crashes and data corruption in the pmem code - not good.
The root cause is that the /dev/pmem driver code implicitly relies on the x86
get_user_pages() implementation doing a get_page() on the page refcount, because
get_page() does a get_zone_device_page() which properly refcounts pmem's separate
page struct arrays that are not present in the regular page struct structures.
(The pmem driver does this because it can cover huge memory areas.)
But the x86 conversion to the generic GUP code changed the get_page() to
page_cache_get_speculative() which is faster but doesn't do the
get_zone_device_page() call the pmem code relies on.
One way to solve the regression would be to change the generic GUP code to use
get_page(), but that would slow things down a bit and punish other generic-GUP
using architectures for an x86-ism they did not care about. (Arguably the pmem
driver was probably not working reliably for them: but nvdimm is an Intel
feature, so non-x86 exposure is probably still limited.)
So restructure the pmem code's interface with the MM instead: get rid of the
get/put_zone_device_page() distinction, integrate put_zone_device_page() into
__put_page() and and restructure the pmem completion-wait and teardown machinery:
Kirill points out that the calls to {get,put}_dev_pagemap() can be
removed from the mm fast path if we take a single get_dev_pagemap()
reference to signify that the page is alive and use the final put of the
page to drop that reference.
This does require some care to make sure that any waits for the
percpu_ref to drop to zero occur *after* devm_memremap_page_release(),
since it now maintains its own elevated reference.
This speeds up things while also making the pmem refcounting more robust going
forward.
Suggested-by: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/149339998297.24933.1129582806028305912.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The description inside uapi/linux/bpf.h about bpf_get_socket_uid
helper function is no longer valid. It returns overflowuid rather
than 0 when failed.
Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When IP tunnel encapsulation rules are offloaded, the kernel can't see
the traffic of the offloaded flow. The neighbour for the IP tunnel
destination of the offloaded flow can mistakenly become STALE and
deleted by the kernel since its 'used' value wasn't changed.
To make sure that a neighbour which is used by the HW won't become
STALE, we proactively update the neighbour 'used' value every
DELAY_PROBE_TIME period, when packets were matched and counted by the HW
for one of the tunnel encap flows related to this neighbour.
The periodic task that updates the used neighbours is scheduled when a
tunnel encap rule is successfully offloaded into HW and keeps re-scheduling
itself as long as the representor's neighbours list isn't empty.
Add, remove, lookup and status change operations done over the
representor's neighbours list or the neighbour hash entry encaps list
are all serialized by RTNL lock.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Since commit 5093bb965a ("powerpc/QE: switch to the cpm_muram
implementation"), muram area is not part of immrbar mapping anymore
so immrbar_virt_to_phys() is not usable anymore.
Fixes: 5093bb965a ("powerpc/QE: switch to the cpm_muram implementation")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Li Yang <pku.leo@gmail.com>
Signed-off-by: Scott Wood <oss@buserror.net>
This commit removes __packed from fscrypt_policy as it does not contain
any implicit padding and does not refer to an on-disk structure. Even
though this is a change to a UAPI file, no users will be broken as the
structure doesn't change.
Signed-off-by: Joe Richey <joerichey@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This commit exposes the necessary constants and structures for a
userspace program to pass filesystem encryption keys into the keyring.
The fscrypt_key structure was already part of the kernel ABI, this
change just makes it so programs no longer have to redeclare these
structures (like e4crypt in e2fsprogs currently does).
Note that we do not expose the other FS_*_KEY_SIZE constants as they are
not necessary. Only XTS is supported for contents_encryption_mode, so
currently FS_MAX_KEY_SIZE bytes of key material must always be passed to
the kernel.
This commit also removes __packed from fscrypt_key as it does not
contain any implicit padding and does not refer to an on-disk structure.
Signed-off-by: Joe Richey <joerichey@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
fscrypt_symlink_data_len() is never called and can be removed.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Support the GETFSMAP ioctls so that we can use the xfs free space
management tools to probe ext4 as well. Note that this is a partial
implementation -- we only report fixed-location metadata and free space;
everything else is reported as "unknown".
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add the GETFSMAP headers to the VFS kernel headers
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Toshi noticed that the new support for a region-level badblocks missed
the case where errors are cleared due to BTT I/O.
An initial attempt to fix this ran into a "sleeping while atomic"
warning due to taking the nvdimm_bus_lock() in the BTT I/O path to
satisfy the locking requirements of __nvdimm_bus_badblocks_clear().
However, that lock is not needed since we are not acting on any data that
is subject to change under that lock. The badblocks instance has its own
internal lock to handle mutations of the error list.
So, in order to make it clear that we are just acting on region devices,
rename __nvdimm_bus_badblocks_clear() to nvdimm_clear_badblocks_regions().
Eliminate the lock and consolidate all support routines for the new
nvdimm_account_cleared_poison() in drivers/nvdimm/bus.c. Finally, to the
opportunity to cleanup to some unnecessary casts, make the calling
convention of nvdimm_clear_badblocks_regions() clearer by replacing struct
resource with the minimal struct clear_badblocks_context, and use the
DEVICE_ATTR macro.
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Reported-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The include file does not need any PCI specifics, so remove
that include. Also fix the places that relied on it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Both opa_vnic and the hfi driver use the same opa_classport_info
definition. We will also have ib_sa capable of querying opa class
port info and would need this definition. Move it to ib_mad.h
for everyone to use.
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This was already disabled a while ago because it caused I/O errors,
and it's severly getting into the way of the discard / write zeroes
rework.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
* acpi-scan:
ACPI / scan: Avoid enumerating devices more than once
ACPI / scan: Apply default enumeration to devices with ACPI drivers
ACPI / scan: Drop support for force_remove
* acpi-tables:
ACPI / tables: Drop acpi_parse_entries() which is not used
* acpi-platform:
ACPI / platform: Update platform device NUMA node based on _PXM method
sequence is protected by spinlock so don't access sequence
in paramter seq when invoking this function.
~0 means to get the latest sequence number and 0 means none to
get.
Change-Id: Ib7a03f3cf5594deeb4ad333cc59b47a6bddfd1ad
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ACPICA commit c04d310039d3e0ed1cb62876fe7e596fbc75ab01
ACPICA commit a65c1df7e6b4bad8e37df822018c40c6c446add9
The key feature of this utility is that the original comments within
the input ASL files are preserved during the conversion process, and
included within the converted ASL+ file -- thus creating a transparent
conversion of existing ASL files to ASL+ (ASL 2.0)
This patch is an automatic generation of the ASL converter commit,
Linux kernel isn't affected by the functionality provided in this
commit, but requires the linuxized changes to support future ACPICA
release automation.
Link: https://github.com/acpica/acpica/commit/c04d3100
Link: https://github.com/acpica/acpica/commit/a65c1df7
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Jung-uk Kim <jkim@FreeBSD.org>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In some SoCs, setting noreboot bit needs modification to
PMC GC registers. But not all PMC drivers allow other drivers
to memory map their GC region. This could create mem request
conflict in watchdog driver. So this patch adds facility to allow
PMC drivers to pass noreboot update function to watchdog
drivers via platform data.
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Johannes Berg says:
====================
Another set of patches for -next:
* API support for concurrent scheduled scan requests
* API changes for roaming reporting
* BSS max idle support in mac80211
* API changes for TX status reporting in mac80211
* API changes for RX rate reporting in mac80211
* rewrite monitor logic to prepare for BPF filters
* bugfix for rare devices without 2.4 GHz support
* a bugfix for recent DFS changes
* some further cleanups
The API changes are actually at a nice time, since it's
typically quiet just before the merge window, and trees
can be synchronized easily during it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>