android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Eric Sandeen	b42db0860e	xfs: enhance dinode verifier Add several more validations to xfs_dinode_verify: - For LOCAL data fork formats, di_nextents must be 0. - For LOCAL attr fork formats, di_anextents must be 0. - For inodes with no attr fork offset, - format must be XFS_DINODE_FMT_EXTENTS if set at all - di_anextents must be 0. Thanks to dchinner for pointing out a couple related checks I had forgotten to add. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199377 Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-04-17 17:10:17 -07:00
Jens Axboe	72961c4e60	bfq-iosched: ensure to clear bic/bfqq pointers when preparing request Even if we don't have an IO context attached to a request, we still need to clear the priv[0..1] pointers, as they could be pointing to previously used bic/bfqq structures. If we don't do so, we'll either corrupt memory on dispatching a request, or cause an imbalance in counters. Inspired by a fix from Kees. Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Reported-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Fixes: `aee69d78de` ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-17 17:08:52 -06:00
Michael Ellerman	8bf24e8319	selftests/filesystems: Don't run dnotify_test by default In commit `ce290a1960` ("selftests: add devpts selftests"), the filesystems directory was added to the top-level selftests Makefile. That had the effect of causing the existing dnotify_test in the filesystems directory to now be run as part of the default selftests test-run. Unfortunately dnotify_test is actually an infinite loop. Fix it by moving dnotify_test to TEST_GEN_PROGS_EXTENDED, which says that it's a generated file (ie. built) but should not be run as part of the default test suite run (it's an "extended" test). While we're here cleanup a few other things, devpts_pts should be in TEST_GEN_PROGS to indicate that it's built, and with the above two changes we no longer need a custom all or clean rule. Fixes: `ce290a1960` ("selftests: add devpts selftests") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Christian brauner <christian.brauner@ubuntu.com> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>	2018-04-17 17:01:16 -06:00
Souptick Joarder	a5240cbde2	fs: cifs: Adding new return type vm_fault_t Use new return type vm_fault_t for page_mkwrite handler. Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2018-04-17 14:44:35 -05:00
Gustavo A. R. Silva	0d568cd34e	cifs: smb2ops: Fix NULL check in smb2_query_symlink The current code null checks variable err_buf, which is always null when it is checked, hence utf16_path is free'd and the function returns -ENOENT everytime it is called, making it impossible for the execution path to reach the following code: err_buf = err_iov.iov_base; Fix this by null checking err_iov.iov_base instead of err_buf. Also, notice that err_buf no longer needs to be initialized to NULL. Addresses-Coverity-ID: 1467876 ("Logically dead code") Fixes: 2d636199e400 ("cifs: Change SMB2_open to return an iov for the error parameter") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>	2018-04-17 14:44:30 -05:00
Eric Biggers	9c438d7a3a	KEYS: DNS: limit the length of option strings Adding a dns_resolver key whose payload contains a very long option name resulted in that string being printed in full. This hit the WARN_ONCE() in set_precision() during the printk(), because printk() only supports a precision of up to 32767 bytes: precision 1000000 too large WARNING: CPU: 0 PID: 752 at lib/vsprintf.c:2189 vsnprintf+0x4bc/0x5b0 Fix it by limiting option strings (combined name + value) to a much more reasonable 128 bytes. The exact limit is arbitrary, but currently the only recognized option is formatted as "dnserror=%lu" which fits well within this limit. Also ratelimit the printks. Reproducer: perl -e 'print "#", "A" x 1000000, "\x00"' \| keyctl padd dns_resolver desc @s This bug was found using syzkaller. Reported-by: Mark Rutland <mark.rutland@arm.com> Fixes: `4a2d789267` ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 15:17:41 -04:00
Bert Kenward	89bda97b44	sfc: check RSS is active for filter insert For some firmware variants - specifically 'capture packed stream' - RSS filters are not valid. We must check if RSS is actually active rather than merely enabled. Fixes: `42356d9a13` ("sfc: support RSS spreading of ethtool ntuple filters") Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 15:07:21 -04:00
Lorenzo Bianconi	a2d481b326	ipv6: send netlink notifications for manually configured addresses Send a netlink notification when userspace adds a manually configured address if DAD is enabled and optimistic flag isn't set. Moreover send RTM_DELADDR notifications for tentative addresses. Some userspace applications (e.g. NetworkManager) are interested in addr netlink events albeit the address is still in tentative state, however events are not sent if DAD process is not completed. If the address is added and immediately removed userspace listeners are not notified. This behaviour can be easily reproduced by using veth interfaces: $ ip -b - <<EOF > link add dev vm1 type veth peer name vm2 > link set dev vm1 up > link set dev vm2 up > addr add 2001:db8:a🅱️1:2:3:4/64 dev vm1 > addr del 2001:db8:a🅱️1:2:3:4/64 dev vm1 EOF This patch reverts the behaviour introduced by the commit `f784ad3d79` ("ipv6: do not send RTM_DELADDR for tentative addresses") Suggested-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 14:03:56 -04:00
Toshiaki Makita	7ce2367254	vlan: Fix reading memory beyond skb->tail in skb_vlan_tagged_multi Syzkaller spotted an old bug which leads to reading skb beyond tail by 4 bytes on vlan tagged packets. This is caused because skb_vlan_tagged_multi() did not check skb_headlen. BUG: KMSAN: uninit-value in eth_type_vlan include/linux/if_vlan.h:283 [inline] BUG: KMSAN: uninit-value in skb_vlan_tagged_multi include/linux/if_vlan.h:656 [inline] BUG: KMSAN: uninit-value in vlan_features_check include/linux/if_vlan.h:672 [inline] BUG: KMSAN: uninit-value in dflt_features_check net/core/dev.c:2949 [inline] BUG: KMSAN: uninit-value in netif_skb_features+0xd1b/0xdc0 net/core/dev.c:3009 CPU: 1 PID: 3582 Comm: syzkaller435149 Not tainted 4.16.0+ #82 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 eth_type_vlan include/linux/if_vlan.h:283 [inline] skb_vlan_tagged_multi include/linux/if_vlan.h:656 [inline] vlan_features_check include/linux/if_vlan.h:672 [inline] dflt_features_check net/core/dev.c:2949 [inline] netif_skb_features+0xd1b/0xdc0 net/core/dev.c:3009 validate_xmit_skb+0x89/0x1320 net/core/dev.c:3084 __dev_queue_xmit+0x1cb2/0x2b60 net/core/dev.c:3549 dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590 packet_snd net/packet/af_packet.c:2944 [inline] packet_sendmsg+0x7c57/0x8a10 net/packet/af_packet.c:2969 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] sock_write_iter+0x3b9/0x470 net/socket.c:909 do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776 do_iter_write+0x30d/0xd40 fs/read_write.c:932 vfs_writev fs/read_write.c:977 [inline] do_writev+0x3c9/0x830 fs/read_write.c:1012 SYSC_writev+0x9b/0xb0 fs/read_write.c:1085 SyS_writev+0x56/0x80 fs/read_write.c:1082 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x43ffa9 RSP: 002b:00007fff2cff3948 EFLAGS: 00000217 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043ffa9 RDX: 0000000000000001 RSI: 0000000020000080 RDI: 0000000000000003 RBP: 00000000006cb018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004018d0 R13: 0000000000401960 R14: 0000000000000000 R15: 0000000000000000 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314 kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2737 [inline] __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:984 [inline] alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234 sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085 packet_alloc_skb net/packet/af_packet.c:2803 [inline] packet_snd net/packet/af_packet.c:2894 [inline] packet_sendmsg+0x6444/0x8a10 net/packet/af_packet.c:2969 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] sock_write_iter+0x3b9/0x470 net/socket.c:909 do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776 do_iter_write+0x30d/0xd40 fs/read_write.c:932 vfs_writev fs/read_write.c:977 [inline] do_writev+0x3c9/0x830 fs/read_write.c:1012 SYSC_writev+0x9b/0xb0 fs/read_write.c:1085 SyS_writev+0x56/0x80 fs/read_write.c:1082 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Fixes: `58e998c6d2` ("offloading: Force software GSO for multiple vlan tags.") Reported-and-tested-by: syzbot+0bbe42c764feafa82c5a@syzkaller.appspotmail.com Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 13:59:28 -04:00
Ganesh Goudar	a64dcddc5c	cxgb4vf: display pause settings Add support to display pause settings Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 13:54:00 -04:00
Hangbin Liu	72f6d71e49	vxlan: add ttl inherit support Like tos inherit, ttl inherit should also means inherit the inner protocol's ttl values, which actually not implemented in vxlan yet. But we could not treat ttl == 0 as "use the inner TTL", because that would be used also when the "ttl" option is not specified and that would be a behavior change, and breaking real use cases. So add a different attribute IFLA_VXLAN_TTL_INHERIT when "ttl inherit" is specified with ip cmd. Reported-by: Jianlin Shi <jishi@redhat.com> Suggested-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 13:53:13 -04:00
Samuel Mendoza-Jonas	062b3e1b6d	net/ncsi: Refactor MAC, VLAN filters The NCSI driver defines a generic ncsi_channel_filter struct that can be used to store arbitrarily formatted filters, and several generic methods of accessing data stored in such a filter. However in both the driver and as defined in the NCSI specification there are only two actual filters: VLAN ID filters and MAC address filters. The splitting of the MAC filter into unicast, multicast, and mixed is also technically not necessary as these are stored in the same location in hardware. To save complexity, particularly in the set up and accessing of these generic filters, remove them in favour of two specific structs. These can be acted on directly and do not need several generic helper functions to use. This also fixes a memory error found by KASAN on ARM32 (which is not upstream yet), where response handlers accessing a filter's data field could write past allocated memory. [ 114.926512] ================================================================== [ 114.933861] BUG: KASAN: slab-out-of-bounds in ncsi_configure_channel+0x4b8/0xc58 [ 114.941304] Read of size 2 at addr 94888558 by task kworker/0:2/546 [ 114.947593] [ 114.949146] CPU: 0 PID: 546 Comm: kworker/0:2 Not tainted 4.16.0-rc6-00119-ge156398bfcad #13 ... [ 115.170233] The buggy address belongs to the object at 94888540 [ 115.170233] which belongs to the cache kmalloc-32 of size 32 [ 115.181917] The buggy address is located 24 bytes inside of [ 115.181917] 32-byte region [94888540, 94888560) [ 115.192115] The buggy address belongs to the page: [ 115.196943] page:9eeac100 count:1 mapcount:0 mapping:94888000 index:0x94888fc1 [ 115.204200] flags: 0x100(slab) [ 115.207330] raw: 00000100 94888000 94888fc1 0000003f 00000001 9eea2014 9eecaa74 96c003e0 [ 115.215444] page dumped because: kasan: bad access detected [ 115.221036] [ 115.222544] Memory state around the buggy address: [ 115.227384] 94888400: fb fb fb fb fc fc fc fc 04 fc fc fc fc fc fc fc [ 115.233959] 94888480: 00 00 00 fc fc fc fc fc 00 04 fc fc fc fc fc fc [ 115.240529] >94888500: 00 00 04 fc fc fc fc fc 00 00 04 fc fc fc fc fc [ 115.247077] ^ [ 115.252523] 94888580: 00 04 fc fc fc fc fc fc 06 fc fc fc fc fc fc fc [ 115.259093] 94888600: 00 00 06 fc fc fc fc fc 00 00 04 fc fc fc fc fc [ 115.265639] ================================================================== Reported-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 13:50:58 -04:00
Eric Biggers	c210f7b411	KEYS: DNS: limit the length of option strings Adding a dns_resolver key whose payload contains a very long option name resulted in that string being printed in full. This hit the WARN_ONCE() in set_precision() during the printk(), because printk() only supports a precision of up to 32767 bytes: precision 1000000 too large WARNING: CPU: 0 PID: 752 at lib/vsprintf.c:2189 vsnprintf+0x4bc/0x5b0 Fix it by limiting option strings (combined name + value) to a much more reasonable 128 bytes. The exact limit is arbitrary, but currently the only recognized option is formatted as "dnserror=%lu" which fits well within this limit. Also ratelimit the printks. Reproducer: perl -e 'print "#", "A" x 1000000, "\x00"' \| keyctl padd dns_resolver desc @s This bug was found using syzkaller. Reported-by: Mark Rutland <mark.rutland@arm.com> Fixes: `4a2d789267` ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 13:42:58 -04:00
Davide Caratti	e3c1917e45	selftest: tc_flower: add testcase for 'ip_flags' Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 13:41:54 -04:00
Stephen Suryaputra	bdb7cc643f	ipv6: Count interface receive statistics on the ingress netdev The statistics such as InHdrErrors should be counted on the ingress netdev rather than on the dev from the dst, which is the egress. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 13:39:51 -04:00
David Ahern	032234d823	net/ipv6: Make __inet6_bind static BPF core gets access to __inet6_bind via ipv6_bpf_stub_impl, so it is not invoked directly outside of af_inet6.c. Make it static and move inet6_bind after to avoid forward declaration. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 13:19:22 -04:00
Liam Girdwood	f53c4c20d6	ASoC: topology: Check widget kcontrols before deref Validate the topology input before we dereference the pointer. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>	2018-04-17 18:06:54 +01:00
Baolin Wang	e142aa09ed	timekeeping: Remove __current_kernel_time() The __current_kernel_time() function based on 'struct timespec' is no longer recommended for new code, and the only user of this function has been replaced by commit `6909e29fde` ("kdb: use __ktime_get_real_seconds instead of __current_kernel_time"). Remove the obsolete interface. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: sboyd@kernel.org Cc: broonie@kernel.org Cc: john.stultz@linaro.org Link: https://lkml.kernel.org/r/1a9dbea7ee2cda7efe9ed330874075cf17fdbff6.1523596316.git.baolin.wang@linaro.org	2018-04-17 17:18:05 +02:00
Liu, Changcheng	f0ae6a0321	timers: Remove stale struct tvec_base forward declaration struct tvec_base is a leftover of the original timer wheel implementation and not longer used. Remove the forward declaration. Signed-off-by: Liu Changcheng <changcheng.liu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Link: https://lkml.kernel.org/r/20180412075701.GA38952@sofia	2018-04-17 17:18:04 +02:00
Geert Uytterhoeven	4450dc0ae2	clockevents: Fix kernel messages split across multiple lines Convert the clockevents driver from old-style printk() to pr_info() and pr_cont(), to fix split kernel messages like below: Clockevents: could not switch to one-shot mode: dummy_timer is not functional. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: https://lkml.kernel.org/r/1522942018-14471-1-git-send-email-geert%2Brenesas@glider.be	2018-04-17 17:18:04 +02:00
David S. Miller	684009d4fd	Merge branch 'XDP-redirect-memory-return-API' Jesper Dangaard Brouer says: ==================== XDP redirect memory return API Submitted against net-next, as it contains NIC driver changes. This patchset works towards supporting different XDP RX-ring memory allocators. As this will be needed by the AF_XDP zero-copy mode. The patchset uses mlx5 as the sample driver, which gets implemented XDP_REDIRECT RX-mode, but not ndo_xdp_xmit (as this API is subject to change thought the patchset). A new struct xdp_frame is introduced (modeled after cpumap xdp_pkt). And both ndo_xdp_xmit and the new xdp_return_frame end-up using this. Support for a driver supplied allocator is implemented, and a refurbished version of page_pool is the first return allocator type introduced. This will be a integration point for AF_XDP zero-copy. The mlx5 driver evolve into using the page_pool, and see a performance increase (with ndo_xdp_xmit out ixgbe driver) from 6Mpps to 12Mpps. The patchset stop at 16 patches (one over limit), but more API changes are planned. Specifically extending ndo_xdp_xmit and xdp_return_frame APIs to support bulking. As this will address some known limits. V2: Updated according to Tariq's feedback V3: Updated based on feedback from Jason Wang and Alex Duyck V4: Updated based on feedback from Tariq and Jason V5: Fix SPDX license, add Tariq's reviews, improve patch desc for perf test V6: Updated based on feedback from Eric Dumazet and Alex Duyck V7: Adapt to i40e that got XDP_REDIRECT support in-between V8: Updated based on feedback kbuild test robot, and adjust for mlx5 changes page_pool only compiled into kernel when drivers Kconfig 'select' feature V9: Remove some inline statements, let compiler decide what to inline Fix return value in virtio_net driver Adjust for mlx5 changes in-between submissions V10: Minor adjust for mlx5 requested by Tariq Resubmit against net-next V11: avoid leaking info stored in frame data on page reuse ==================== Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 11:17:58 -04:00
Matt Redfearn	daf70d89f8	MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup The __clear_user function is defined to return the number of bytes that could not be cleared. From the underlying memset / bzero implementation this means setting register a2 to that number on return. Currently if a page fault is triggered within the memset_partial block, the value loaded into a2 on return is meaningless. The label .Lpartial_fixup\@ is jumped to on page fault. In order to work out how many bytes failed to copy, the exception handler should find how many bytes left in the partial block (andi a2, STORMASK), add that to the partial block end address (a2), and subtract the faulting address to get the remainder. Currently it incorrectly subtracts the partial block start address (t1), which has additionally been clobbered to generate a jump target in memset_partial. Fix this by adding the block end address instead. This issue was found with the following test code: int j, k; for (j = 0; j < 512; j++) { if ((k = clear_user(NULL, j)) != j) { pr_err("clear_user (NULL %d) returned %d\n", j, k); } } Which now passes on Creator Ci40 (MIPS32) and Cavium Octeon II (MIPS64). Suggested-by: James Hogan <jhogan@kernel.org> Signed-off-by: Matt Redfearn <matt.redfearn@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/19108/ Signed-off-by: James Hogan <jhogan@kernel.org>	2018-04-17 16:17:23 +01:00
Mark Rutland	800cb2e553	arm64: kasan: avoid pfn_to_nid() before page array is initialized In arm64's kasan_init(), we use pfn_to_nid() to find the NUMA node a span of memory is in, hoping to allocate shadow from the same NUMA node. However, at this point, the page array has not been initialized, and thus this is bogus. Since commit: `f165b378bb` ("mm: uninitialized struct page poisoning sanity") ... accessing fields of the page array results in a boot time Oops(), highlighting this problem: [ 0.000000] Unable to handle kernel paging request at virtual address dfff200000000000 [ 0.000000] Mem abort info: [ 0.000000] ESR = 0x96000004 [ 0.000000] Exception class = DABT (current EL), IL = 32 bits [ 0.000000] SET = 0, FnV = 0 [ 0.000000] EA = 0, S1PTW = 0 [ 0.000000] Data abort info: [ 0.000000] ISV = 0, ISS = 0x00000004 [ 0.000000] CM = 0, WnR = 0 [ 0.000000] [dfff200000000000] address between user and kernel address ranges [ 0.000000] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.16.0-07317-gf165b378bbdf #42 [ 0.000000] Hardware name: ARM Juno development board (r1) (DT) [ 0.000000] pstate: 80000085 (Nzcv daIf -PAN -UAO) [ 0.000000] pc : __asan_load8+0x8c/0xa8 [ 0.000000] lr : __dump_page+0x3c/0x3b8 [ 0.000000] sp : ffff2000099b7ca0 [ 0.000000] x29: ffff2000099b7ca0 x28: ffff20000a1762c0 [ 0.000000] x27: ffff7e0000000000 x26: ffff2000099dd000 [ 0.000000] x25: ffff200009a3f960 x24: ffff200008f9c38c [ 0.000000] x23: ffff20000a9d3000 x22: ffff200009735430 [ 0.000000] x21: fffffffffffffffe x20: ffff7e0001e50420 [ 0.000000] x19: ffff7e0001e50400 x18: 0000000000001840 [ 0.000000] x17: ffffffffffff8270 x16: 0000000000001840 [ 0.000000] x15: 0000000000001920 x14: 0000000000000004 [ 0.000000] x13: 0000000000000000 x12: 0000000000000800 [ 0.000000] x11: 1ffff0012d0f89ff x10: ffff10012d0f89ff [ 0.000000] x9 : 0000000000000000 x8 : ffff8009687c5000 [ 0.000000] x7 : 0000000000000000 x6 : ffff10000f282000 [ 0.000000] x5 : 0000000000000040 x4 : fffffffffffffffe [ 0.000000] x3 : 0000000000000000 x2 : dfff200000000000 [ 0.000000] x1 : 0000000000000005 x0 : 0000000000000000 [ 0.000000] Process swapper (pid: 0, stack limit = 0x (ptrval)) [ 0.000000] Call trace: [ 0.000000] __asan_load8+0x8c/0xa8 [ 0.000000] __dump_page+0x3c/0x3b8 [ 0.000000] dump_page+0xc/0x18 [ 0.000000] kasan_init+0x2e8/0x5a8 [ 0.000000] setup_arch+0x294/0x71c [ 0.000000] start_kernel+0xdc/0x500 [ 0.000000] Code: aa0403e0 9400063c 17ffffee d343fc00 (38e26800) [ 0.000000] ---[ end trace 67064f0e9c0cc338 ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- Let's fix this by using early_pfn_to_nid(), as other architectures do in their kasan init code. Note that early_pfn_to_nid acquires the nid from the memblock array, which we iterate over in kasan_init(), so this should be fine. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Fixes: `39d114ddc6` ("arm64: add KASAN support") Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2018-04-17 16:16:59 +01:00
Jesper Dangaard Brouer	6dfb970d3d	xdp: avoid leaking info stored in frame data on page reuse The bpf infrastructure and verifier goes to great length to avoid bpf progs leaking kernel (pointer) info. For queueing an xdp_buff via XDP_REDIRECT, xdp_frame info stores kernel info (incl pointers) in top part of frame data (xdp->data_hard_start). Checks are in place to assure enough headroom is available for this. This info is not cleared, and if the frame is reused, then a malicious user could use bpf_xdp_adjust_head helper to move xdp->data into this area. Thus, making this area readable. This is not super critical as XDP progs requires root or CAP_SYS_ADMIN, which are privileged enough for such info. An effort (is underway) towards moving networking bpf hooks to the lesser privileged mode CAP_NET_ADMIN, where leaking such info should be avoided. Thus, this patch to clear the info when needed. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 10:50:30 -04:00
Jesper Dangaard Brouer	44fa2dbd47	xdp: transition into using xdp_frame for ndo_xdp_xmit Changing API ndo_xdp_xmit to take a struct xdp_frame instead of struct xdp_buff. This brings xdp_return_frame and ndp_xdp_xmit in sync. This builds towards changing the API further to become a bulk API, because xdp_buff is not a queue-able object while xdp_frame is. V4: Adjust for commit `59655a5b6c` ("tuntap: XDP_TX can use native XDP") V7: Adjust for commit `d9314c474d` ("i40e: add support for XDP_REDIRECT") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 10:50:30 -04:00
Jesper Dangaard Brouer	039930945a	xdp: transition into using xdp_frame for return API Changing API xdp_return_frame() to take struct xdp_frame as argument, seems like a natural choice. But there are some subtle performance details here that needs extra care, which is a deliberate choice. When de-referencing xdp_frame on a remote CPU during DMA-TX completion, result in the cache-line is change to "Shared" state. Later when the page is reused for RX, then this xdp_frame cache-line is written, which change the state to "Modified". This situation already happens (naturally) for, virtio_net, tun and cpumap as the xdp_frame pointer is the queued object. In tun and cpumap, the ptr_ring is used for efficiently transferring cache-lines (with pointers) between CPUs. Thus, the only option is to de-referencing xdp_frame. It is only the ixgbe driver that had an optimization, in which it can avoid doing the de-reference of xdp_frame. The driver already have TX-ring queue, which (in case of remote DMA-TX completion) have to be transferred between CPUs anyhow. In this data area, we stored a struct xdp_mem_info and a data pointer, which allowed us to avoid de-referencing xdp_frame. To compensate for this, a prefetchw is used for telling the cache coherency protocol about our access pattern. My benchmarks show that this prefetchw is enough to compensate the ixgbe driver. V7: Adjust for commit `d9314c474d` ("i40e: add support for XDP_REDIRECT") V8: Adjust for commit `bd658dda42` ("net/mlx5e: Separate dma base address and offset in dma_sync call") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 10:50:29 -04:00
Jesper Dangaard Brouer	60bbf7eeef	mlx5: use page_pool for xdp_return_frame call This patch shows how it is possible to have both the driver local page cache, which uses elevated refcnt for "catching"/avoiding SKB put_page returns the page through the page allocator. And at the same time, have pages getting returned to the page_pool from ndp_xdp_xmit DMA completion. The performance improvement for XDP_REDIRECT in this patch is really good. Especially considering that (currently) the xdp_return_frame API and page_pool_put_page() does per frame operations of both rhashtable ID-lookup and locked return into (page_pool) ptr_ring. (It is the plan to remove these per frame operation in a followup patchset). The benchmark performed was RX on mlx5 and XDP_REDIRECT out ixgbe, with xdp_redirect_map (using devmap) . And the target/maximum capability of ixgbe is 13Mpps (on this HW setup). Before this patch for mlx5, XDP redirected frames were returned via the page allocator. The single flow performance was 6Mpps, and if I started two flows the collective performance drop to 4Mpps, because we hit the page allocator lock (further negative scaling occurs). Two test scenarios need to be covered, for xdp_return_frame API, which is DMA-TX completion running on same-CPU or cross-CPU free/return. Results were same-CPU=10Mpps, and cross-CPU=12Mpps. This is very close to our 13Mpps max target. The reason max target isn't reached in cross-CPU test, is likely due to RX-ring DMA unmap/map overhead (which doesn't occur in ixgbe to ixgbe testing). It is also planned to remove this unnecessary DMA unmap in a later patchset V2: Adjustments requested by Tariq - Changed page_pool_create return codes not return NULL, only ERR_PTR, as this simplifies err handling in drivers. - Save a branch in mlx5e_page_release - Correct page_pool size calc for MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ V5: Updated patch desc V8: Adjust for `b0cedc844c` ("net/mlx5e: Remove rq_headroom field from params") V9: - Adjust for `121e892754` ("net/mlx5e: Refactor RQ XDP_TX indication") - Adjust for `73281b78a3` ("net/mlx5e: Derive Striding RQ size from MTU") - Correct handling if page_pool_create fail for MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ V10: Req from Tariq - Change pool_size calc for MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 10:50:29 -04:00
Jesper Dangaard Brouer	57d0a1c1ac	xdp: allow page_pool as an allocator type in xdp_return_frame New allocator type MEM_TYPE_PAGE_POOL for page_pool usage. The registered allocator page_pool pointer is not available directly from xdp_rxq_info, but it could be (if needed). For now, the driver should keep separate track of the page_pool pointer, which it should use for RX-ring page allocation. As suggested by Saeed, to maintain a symmetric API it is the drivers responsibility to allocate/create and free/destroy the page_pool. Thus, after the driver have called xdp_rxq_info_unreg(), it is drivers responsibility to free the page_pool, but with a RCU free call. This is done easily via the page_pool helper page_pool_destroy() (which avoids touching any driver code during the RCU callback, which could happen after the driver have been unloaded). V8: address issues found by kbuild test robot - Address sparse should be static warnings - Allow xdp.o to be compiled without page_pool.o V9: Remove inline from .c file, compiler knows best Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 10:50:29 -04:00
Jesper Dangaard Brouer	ff7d6b27f8	page_pool: refurbish version of page_pool code Need a fast page recycle mechanism for ndo_xdp_xmit API for returning pages on DMA-TX completion time, which have good cross CPU performance, given DMA-TX completion time can happen on a remote CPU. Refurbish my page_pool code, that was presented[1] at MM-summit 2016. Adapted page_pool code to not depend the page allocator and integration into struct page. The DMA mapping feature is kept, even-though it will not be activated/used in this patchset. [1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf V2: Adjustments requested by Tariq - Changed page_pool_create return codes, don't return NULL, only ERR_PTR, as this simplifies err handling in drivers. V4: many small improvements and cleanups - Add DOC comment section, that can be used by kernel-doc - Improve fallback mode, to work better with refcnt based recycling e.g. remove a WARN as pointed out by Tariq e.g. quicker fallback if ptr_ring is empty. V5: Fixed SPDX license as pointed out by Alexei V6: Adjustments requested by Eric Dumazet - Adjust ____cacheline_aligned_in_smp usage/placement - Move rcu_head in struct page_pool - Free pages quicker on destroy, minimize resources delayed an RCU period - Remove code for forward/backward compat ABI interface V8: Issues found by kbuild test robot - Address sparse should be static warnings - Only compile+link when a driver use/select page_pool, mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 10:50:29 -04:00
Jesper Dangaard Brouer	8d5d885275	xdp: rhashtable with allocator ID to pointer mapping Use the IDA infrastructure for getting a cyclic increasing ID number, that is used for keeping track of each registered allocator per RX-queue xdp_rxq_info. Instead of using the IDR infrastructure, which uses a radix tree, use a dynamic rhashtable, for creating ID to pointer lookup table, because this is faster. The problem that is being solved here is that, the xdp_rxq_info pointer (stored in xdp_buff) cannot be used directly, as the guaranteed lifetime is too short. The info is needed on a (potentially) remote CPU during DMA-TX completion time . In an xdp_frame the xdp_mem_info is stored, when it got converted from an xdp_buff, which is sufficient for the simple page refcnt based recycle schemes. For more advanced allocators there is a need to store a pointer to the registered allocator. Thus, there is a need to guard the lifetime or validity of the allocator pointer, which is done through this rhashtable ID map to pointer. The removal and validity of of the allocator and helper struct xdp_mem_allocator is guarded by RCU. The allocator will be created by the driver, and registered with xdp_rxq_info_reg_mem_model(). It is up-to debate who is responsible for freeing the allocator pointer or invoking the allocator destructor function. In any case, this must happen via RCU freeing. Use the IDA infrastructure for getting a cyclic increasing ID number, that is used for keeping track of each registered allocator per RX-queue xdp_rxq_info. V4: Per req of Jason Wang - Use xdp_rxq_info_reg_mem_model() in all drivers implementing XDP_REDIRECT, even-though it's not strictly necessary when allocator==NULL for type MEM_TYPE_PAGE_SHARED (given it's zero). V6: Per req of Alex Duyck - Introduce rhashtable_lookup() call in later patch V8: Address sparse should be static warnings (from kbuild test robot) Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 10:50:29 -04:00
Jesper Dangaard Brouer	84f5e3fb79	mlx5: register a memory model when XDP is enabled Now all the users of ndo_xdp_xmit have been converted to use xdp_return_frame. This enable a different memory model, thus activating another code path in the xdp_return_frame API. V2: Fixed issues pointed out by Tariq. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 10:50:28 -04:00
Jesper Dangaard Brouer	b411ef1102	i40e: convert to use generic xdp_frame and xdp_return_frame API Also convert driver i40e, which very recently got XDP_REDIRECT support in commit `d9314c474d` ("i40e: add support for XDP_REDIRECT"). V7: This patch got added in V7 of this patchset. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 10:50:28 -04:00
Jesper Dangaard Brouer	70280ed91c	bpf: cpumap convert to use generic xdp_frame The generic xdp_frame format, was inspired by the cpumap own internal xdp_pkt format. It is now time to convert it over to the generic xdp_frame format. The cpumap needs one extra field dev_rx. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 10:50:28 -04:00
Jesper Dangaard Brouer	cac320c850	virtio_net: convert to use generic xdp_frame and xdp_return_frame API The virtio_net driver assumes XDP frames are always released based on page refcnt (via put_page). Thus, is only queues the XDP data pointer address and uses virt_to_head_page() to retrieve struct page. Use the XDP return API to get away from such assumptions. Instead queue an xdp_frame, which allow us to use the xdp_return_frame API, when releasing the frame. V8: Avoid endianness issues (found by kbuild test robot) V9: Change __virtnet_xdp_xmit from bool to int return value (found by Dan Carpenter) Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 10:50:28 -04:00
Jesper Dangaard Brouer	1ffcbc8537	tun: convert to use generic xdp_frame and xdp_return_frame API The tuntap driver invented it's own driver specific way of queuing XDP packets, by storing the xdp_buff information in the top of the XDP frame data. Convert it over to use the more generic xdp_frame structure. The main problem with the in-driver method is that the xdp_rxq_info pointer cannot be trused/used when dequeueing the frame. V3: Remove check based on feedback from Jason Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 10:50:28 -04:00
Jesper Dangaard Brouer	c0048cff8a	xdp: introduce a new xdp_frame type This is needed to convert drivers tuntap and virtio_net. This is a generalization of what is done inside cpumap, which will be converted later. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 10:50:28 -04:00
Jesper Dangaard Brouer	106ca27f29	xdp: move struct xdp_buff from filter.h to xdp.h This is done to prepare for the next patch, and it is also nice to move this XDP related struct out of filter.h. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 10:50:28 -04:00
Jesper Dangaard Brouer	189ead81a8	ixgbe: use xdp_return_frame API Extend struct ixgbe_tx_buffer to store the xdp_mem_info. Notice that this could be optimized further by putting this into a union in the struct ixgbe_tx_buffer, but this patchset works towards removing this again. Thus, this is not done. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 10:50:27 -04:00
Jesper Dangaard Brouer	5ab073ffd3	xdp: introduce xdp_return_frame API and use in cpumap Introduce an xdp_return_frame API, and convert over cpumap as the first user, given it have queued XDP frame structure to leverage. V3: Cleanup and remove C99 style comments, pointed out by Alex Duyck. V6: Remove comment that id will be added later (Req by Alex Duyck) V8: Rename enum mem_type to xdp_mem_type (found by kbuild test robot) Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 10:50:27 -04:00
Jesper Dangaard Brouer	5168d73201	mlx5: basic XDP_REDIRECT forward support This implements basic XDP redirect support in mlx5 driver. Notice that the ndo_xdp_xmit() is NOT implemented, because that API need some changes that this patchset is working towards. The main purpose of this patch is have different drivers doing XDP_REDIRECT to show how different memory models behave in a cross driver world. Update(pre-RFCv2 Tariq): Need to DMA unmap page before xdp_do_redirect, as the return API does not exist yet to to keep this mapped. Update(pre-RFCv3 Saeed): Don't mix XDP_TX and XDP_REDIRECT flushing, introduce xdpsq.db.redirect_flush boolian. V9: Adjust for commit `121e892754` ("net/mlx5e: Refactor RQ XDP_TX indication") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 10:50:27 -04:00
Nicolas Dechesne	77ac725e0c	net: qrtr: add MODULE_ALIAS_NETPROTO macro To ensure that qrtr can be loaded automatically, when needed, if it is compiled as module. Signed-off-by: Nicolas Dechesne <nicolas.dechesne@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 09:58:00 -04:00
Intiyaz Basha	897ddc2483	liquidio: Enhanced ethtool stats 1. Added red_drops stats. Inbound packets dropped by RED, buffer exhaustion 2. Included fcs_err, jabber_err, l2_err and frame_err errors under rx_errors 3. Included fifo_err, dmac_drop, red_drops, fw_err_pko, fw_err_link and fw_err_drop under rx_dropped 4. Included max_collision_fail, max_deferral_fail, total_collisions, fw_err_pko, fw_err_link, fw_err_drop and fw_err_pki under tx_dropped 5. Counting dma mapping errors 6. Added some firmware stats description and removed for some Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Acked-by: Satanand Burla <satananda.burla@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 09:56:52 -04:00
Stefan Hajnoczi	05e489b159	VSOCK: make af_vsock.ko removable again Commit `c1eef220c1` ("vsock: always call vsock_init_tables()") introduced a module_init() function without a corresponding module_exit() function. Modules with an init function can only be removed if they also have an exit function. Therefore the vsock module was considered "permanent" and could not be removed. This patch adds an empty module_exit() function so that "rmmod vsock" works. No explicit cleanup is required because: 1. Transports call vsock_core_exit() upon exit and cannot be removed while sockets are still alive. 2. vsock_diag.ko does not perform any action that requires cleanup by vsock.ko. Fixes: `c1eef220c1` ("vsock: always call vsock_init_tables()") Reported-by: Xiumei Mu <xmu@redhat.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-17 09:44:30 -04:00
Joerg Roedel	d6ef1f194b	x86/mm: Prevent kernel Oops in PTDUMP code with HIGHPTE=y The walk_pte_level() function just uses __va to get the virtual address of the PTE page, but that breaks when the PTE page is not in the direct mapping with HIGHPTE=y. The result is an unhandled kernel paging request at some random address when accessing the current_kernel or current_user file. Use the correct API to access PTE pages. Fixes: `fe770bf031` ('x86: clean up the page table dumper and add 32-bit support') Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: jgross@suse.com Cc: JBeulich@suse.com Cc: hpa@zytor.com Cc: aryabinin@virtuozzo.com Cc: kirill.shutemov@linux.intel.com Link: https://lkml.kernel.org/r/1523971636-4137-1-git-send-email-joro@8bytes.org	2018-04-17 15:43:01 +02:00
Alison Schofield	1340ccfa9a	x86,sched: Allow topologies where NUMA nodes share an LLC Intel's Skylake Server CPUs have a different LLC topology than previous generations. When in Sub-NUMA-Clustering (SNC) mode, the package is divided into two "slices", each containing half the cores, half the LLC, and one memory controller and each slice is enumerated to Linux as a NUMA node. This is similar to how the cores and LLC were arranged for the Cluster-On-Die (CoD) feature. CoD allowed the same cache line to be present in each half of the LLC. But, with SNC, each line is only ever present in one slice. This means that the portion of the LLC available to a CPU depends on the data being accessed: Remote socket: entire package LLC is shared Local socket->local slice: data goes into local slice LLC Local socket->remote slice: data goes into remote-slice LLC. Slightly higher latency than local slice LLC. The biggest implication from this is that a process accessing all NUMA-local memory only sees half the LLC capacity. The CPU describes its cache hierarchy with the CPUID instruction. One of the CPUID leaves enumerates the "logical processors sharing this cache". This information is used for scheduling decisions so that tasks move more freely between CPUs sharing the cache. But, the CPUID for the SNC configuration discussed above enumerates the LLC as being shared by the entire package. This is not 100% precise because the entire cache is not usable by all accesses. But, it is the way the hardware enumerates itself, and this is not likely to change. The userspace visible impact of all the above is that the sysfs info reports the entire LLC as being available to the entire package. As noted above, this is not true for local socket accesses. This patch does not correct the sysfs info. It is the same, pre and post patch. The current code emits the following warning: sched: CPU #3's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. The warning is coming from the topology_sane() check in smpboot.c because the topology is not matching the expectations of the model for obvious reasons. To fix this, add a vendor and model specific check to never call topology_sane() for these systems. Also, just like "Cluster-on-Die" disable the "coregroup" sched_domain_topology_level and use NUMA information from the SRAT alone. This is OK at least on the hardware we are immediately concerned about because the LLC sharing happens at both the slice and at the package level, which are also NUMA boundaries. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: brice.goglin@gmail.com Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: https://lkml.kernel.org/r/20180407002130.GA18984@alison-desk.jf.intel.com	2018-04-17 15:39:55 +02:00
Jiri Olsa	bfb3d7b8b9	perf: Remove superfluous allocation error check If the get_callchain_buffers fails to allocate the buffer it will decrease the nr_callchain_events right away. There's no point of checking the allocation error for nr_callchain_events > 1. Removing that check. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: syzkaller-bugs@googlegroups.com Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20180415092352.12403-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-04-17 09:47:40 -03:00
Jiri Olsa	5af44ca53d	perf: Fix sample_max_stack maximum check The syzbot hit KASAN bug in perf_callchain_store having the entry stored behind the allocated bounds [1]. We miss the sample_max_stack check for the initial event that allocates callchain buffers. This missing check allows to create an event with sample_max_stack value bigger than the global sysctl maximum: # sysctl -a \| grep perf_event_max_stack kernel.perf_event_max_stack = 127 # perf record -vv -C 1 -e cycles/max-stack=256/ kill ... perf_event_attr: size 112 ... sample_max_stack 256 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 4 Note the '-C 1', which forces perf record to create just single event. Otherwise it opens event for every cpu, then the sample_max_stack check fails on the second event and all's fine. The fix is to run the sample_max_stack check also for the first event with callchains. [1] https://marc.info/?l=linux-kernel&m=152352732920874&w=2 Reported-by: syzbot+7c449856228b63ac951e@syzkaller.appspotmail.com Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: syzkaller-bugs@googlegroups.com Cc: x86@kernel.org Fixes: `97c79a38cd` ("perf core: Per event callchain limit") Link: http://lkml.kernel.org/r/20180415092352.12403-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-04-17 09:47:40 -03:00
Jiri Olsa	78b562fbfa	perf: Return proper values for user stack errors Return immediately when we find issue in the user stack checks. The error value could get overwritten by following check for PERF_SAMPLE_REGS_INTR. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: syzkaller-bugs@googlegroups.com Cc: x86@kernel.org Fixes: `60e2364e60` ("perf: Add ability to sample machine state on interrupt") Link: http://lkml.kernel.org/r/20180415092352.12403-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-04-17 09:47:39 -03:00
Thomas Richter	038586c343	perf list: Add s390 support for detailed/verbose PMU event description 'perf list' with flags -d and -v print a description (-d) or a very verbose explanation (-v) of CPU specific counter events. These descriptions are provided with the json files in directory pmu-events/arch/s390/*.json. Display of these descriptions on s390 requires the corresponding json files. On s390 this does not work because function is_pmu_core() does not detect the s390 directory name where the CPU specific events are listed. On x86 it is: /sys/bus/event_source/devices/cpu whereas on s390 it is: /sys/bus/event_source/devices/cpum_cf /sys/bus/event_source/devices/cpum_sf Fix this by adding s390 directory name testing to function is_pmu_core(). This is the same approach as taken for the ARM platform. Output before: [root@s35lp76 perf]# ./perf list -d pmu List of pre-defined events (to be used in -e): cpum_cf/AES_BLOCKED_CYCLES/ [Kernel PMU event] cpum_cf/AES_BLOCKED_FUNCTIONS/ [Kernel PMU event] cpum_cf/AES_CYCLES/ [Kernel PMU event] cpum_cf/AES_FUNCTIONS/ [Kernel PMU event] .... cpum_cf/TX_NC_TEND/ [Kernel PMU event] cpum_cf/VX_BCD_EXECUTION_SLOTS/ [Kernel PMU event] cpum_sf/SF_CYCLES_BASIC/ [Kernel PMU event] Output after: [root@s35lp76 perf]# ./perf list -d pmu List of pre-defined events (to be used in -e): cpum_cf/AES_BLOCKED_CYCLES/ [Kernel PMU event] cpum_cf/AES_BLOCKED_FUNCTIONS/ [Kernel PMU event] cpum_cf/AES_CYCLES/ [Kernel PMU event] cpum_cf/AES_FUNCTIONS/ [Kernel PMU event] .... cpum_cf/TX_NC_TEND/ [Kernel PMU event] cpum_cf/VX_BCD_EXECUTION_SLOTS/ [Kernel PMU event] cpum_sf/SF_CYCLES_BASIC/ [Kernel PMU event] 3906: bcd_dfp_execution_slots [BCD DFP Execution Slots] decimal_instructions [Decimal Instructions] dtlb2_gpage_writes [DTLB2 GPAGE Writes] dtlb2_hpage_writes [DTLB2 HPAGE Writes] dtlb2_misses [DTLB2 Misses] dtlb2_writes [DTLB2 Writes] itlb2_misses [ITLB2 Misses] itlb2_writes [ITLB2 Writes] l1c_tlb2_misses [L1C TLB2 Misses] ..... cfvn 3: cpu_cycles [CPU Cycles] instructions [Instructions] l1d_dir_writes [L1D Directory Writes] l1d_penalty_cycles [L1D Penalty Cycles] l1i_dir_writes [L1I Directory Writes] l1i_penalty_cycles [L1I Penalty Cycles] problem_state_cpu_cycles [Problem State CPU Cycles] problem_state_instructions [Problem State Instructions] .... csvn generic: aes_blocked_cycles [AES Blocked Cycles] aes_blocked_functions [AES Blocked Functions] aes_cycles [AES Cycles] aes_functions [AES Functions] dea_blocked_cycles [DEA Blocked Cycles] dea_blocked_functions [DEA Blocked Functions] .... Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20180416132314.33249-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-04-17 09:47:39 -03:00
Alexey Budankov	bf30cc1882	perf script: Extend misc field decoding with switch out event type Append 'p' sign to 'S' tag designating the type of context switch out event so 'Sp' means preemption context switch. Documentation is extended to cover new presentation changes. $ perf script --show-switch-events -F +misc -I -i perf.data: hdparm 4073 [004] U 762.198265: 380194 cycles:ppp: 7faf727f5a23 strchr (/usr/lib64/ld-2.26.so) hdparm 4073 [004] K 762.198366: 441572 cycles:ppp: ffffffffb9218435 alloc_set_pte (/lib/modules/4.16.0-rc6+/build/vmlinux) hdparm 4073 [004] S 762.198391: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0 swapper 0 [004] 762.198392: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 4073/4073 swapper 0 [004] Sp 762.198477: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 4073/4073 hdparm 4073 [004] 762.198478: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0 swapper 0 [007] K 762.198514: 2303073 cycles:ppp: ffffffffb98b0c66 intel_idle (/lib/modules/4.16.0-rc6+/build/vmlinux) swapper 0 [007] Sp 762.198561: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 1134/1134 kworker/u16:18 1134 [007] 762.198562: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0 kworker/u16:18 1134 [007] S 762.198567: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0 Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5fc65ce7-8ca5-53ae-8858-8ddd27290575@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-04-17 09:47:39 -03:00

... 25 26 27 28 29 ...

753602 Commits