android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Pablo Neira Ayuso	b7bd1809e0	netfilter: nfnetlink_queue: get rid of nfnetlink_queue_ct.c The original intention was to avoid dependencies between nfnetlink_queue and conntrack without ifdef pollution. However, we can achieve this by moving the conntrack dependent code into ctnetlink and keep some glue code to access the nfq_ct indirection from nfqueue. After this patch, the nfq_ct indirection is always compiled in the netfilter core to avoid polluting nfqueue with ifdefs. Thus, if nf_conntrack is not compiled this results in only 8-bytes of memory waste in x86_64. This patch also adds ctnetlink_nfqueue_seqadj() to avoid that the nf_conn structure layout if exposed to nf_queue, which creates another dependency with nf_conntrack at compilation time. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-04 21:45:44 +02:00
Eric Dumazet	e96f78ab27	tcp/dccp: add SLAB_DESTROY_BY_RCU flag for request sockets Before letting request sockets being put in TCP/DCCP regular ehash table, we need to add either : - SLAB_DESTROY_BY_RCU flag to their kmem_cache - add RCU grace period before freeing them. Since we carefully respected the SLAB_DESTROY_BY_RCU protocol like ESTABLISH and TIMEWAIT sockets, use it here. req_prot_init() being only used by TCP and DCCP, I did not add a new slab_flags into their rsk_prot, but reuse prot->slab_flags Since all reqsk_alloc() users are correctly dealing with a failure, add the __GFP_NOWARN flag to avoid traces under pressure. Fixes: `079096f103` ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 13:25:20 -07:00
Daniel Borkmann	754f1e6a36	sched, bpf: make skb->priority writable {cls,act}_bpf can now set the skb->priority from an eBPF program based on various critera, so that for example classful qdiscs like multiq can update the skb's priority during enqueue time and further push it down into subsequent qdiscs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 05:02:41 -07:00
Daniel Borkmann	c46646d048	sched, bpf: add helper for retrieving routing realms Using routing realms as part of the classifier is quite useful, it can be viewed as a tag for one or multiple routing entries (think of an analogy to net_cls cgroup for processes), set by user space routing daemons or via iproute2 as an indicator for traffic classifiers and later on processed in the eBPF program. Unlike actions, the classifier can inspect device flags and enable netif_keep_dst() if necessary. tc actions don't have that possibility, but in case people know what they are doing, it can be used from there as well (e.g. via devs that must keep dsts by design anyway). If a realm is set, the handler returns the non-zero realm. User space can set the full 32bit realm for the dst. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 05:02:41 -07:00
Daniel Borkmann	a91263d520	ebpf: migrate bpf_prog's flags to bitfield As we need to add further flags to the bpf_prog structure, lets migrate both bools to a bitfield representation. The size of the base structure (excluding insns) remains unchanged at 40 bytes. Add also tags for the kmemchecker, so that it doesn't throw false positives. Even in case gcc would generate suboptimal code, it's not being accessed in performance critical paths. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 05:02:39 -07:00
Jiri Pirko	9e8f4a548a	switchdev: push object ID back to object structure Suggested-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:49:40 -07:00
Jiri Pirko	648b4a995a	switchdev: bring back switchdev_obj and use it as a generic object param Replace "void *obj" with a generic structure. Introduce couple of helpers along that. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:49:39 -07:00
Jiri Pirko	52ba57cfdc	switchdev: rename switchdev_obj_fdb to switchdev_obj_port_fdb Make the struct name in sync with object id name. Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:49:39 -07:00
Jiri Pirko	8f24f3095d	switchdev: rename switchdev_obj_vlan to switchdev_obj_port_vlan Make the struct name in sync with object id name. Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:49:38 -07:00
Jiri Pirko	1f86839874	switchdev: rename SWITCHDEV_ATTR_* enum values to SWITCHDEV_ATTR_ID_* To be aligned with obj. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:49:37 -07:00
Jiri Pirko	57d80838da	switchdev: rename SWITCHDEV_OBJ_* enum values to SWITCHDEV_OBJ_ID_* Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:49:36 -07:00
Eric Dumazet	e994b2f0fb	tcp: do not lock listener to process SYN packets Everything should now be ready to finally allow SYN packets processing without holding listener lock. Tested: 3.5 Mpps SYNFLOOD. Plenty of cpu cycles available. Next bottleneck is the refcount taken on listener, that could be avoided if we remove SLAB_DESTROY_BY_RCU strict semantic for listeners, and use regular RCU. 13.18% [kernel] [k] __inet_lookup_listener 9.61% [kernel] [k] tcp_conn_request 8.16% [kernel] [k] sha_transform 5.30% [kernel] [k] inet_reqsk_alloc 4.22% [kernel] [k] sock_put 3.74% [kernel] [k] tcp_make_synack 2.88% [kernel] [k] ipt_do_table 2.56% [kernel] [k] memcpy_erms 2.53% [kernel] [k] sock_wfree 2.40% [kernel] [k] tcp_v4_rcv 2.08% [kernel] [k] fib_table_lookup 1.84% [kernel] [k] tcp_openreq_init_rwin Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:46 -07:00
Eric Dumazet	92d6f176fd	tcp/dccp: add a reschedule point in inet_csk_listen_stop() If a listener with thousands of children in accept queue is dismantled, it can take a while to close all of them. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:45 -07:00
Eric Dumazet	ef547f2ac1	tcp: remove max_qlen_log This control variable was set at first listen(fd, backlog) call, but not updated if application tried to increase or decrease backlog. It made sense at the time listener had a non resizeable hash table. Also rounding to powers of two was not very friendly. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:44 -07:00
Eric Dumazet	10cbc8f179	tcp/dccp: remove struct listen_sock It is enough to check listener sk_state, no need for an extra condition. max_qlen_log can be moved into struct request_sock_queue We can remove syn_wait_lock and the alignment it enforced. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:43 -07:00
Eric Dumazet	ca6fb06518	tcp: attach SYNACK messages to request sockets instead of listener If a listen backlog is very big (to avoid syncookies), then the listener sk->sk_wmem_alloc is the main source of false sharing, as we need to touch it twice per SYNACK re-transmit and TX completion. (One SYN packet takes listener lock once, but up to 6 SYNACK are generated) By attaching the skb to the request socket, we remove this source of contention. Tested: listen(fd, 10485760); // single listener (no SO_REUSEPORT) 16 RX/TX queue NIC Sustain a SYNFLOOD attack of ~320,000 SYN per second, Sending ~1,400,000 SYNACK per second. Perf profiles now show listener spinlock being next bottleneck. 20.29% [kernel] [k] queued_spin_lock_slowpath 10.06% [kernel] [k] __inet_lookup_established 5.12% [kernel] [k] reqsk_timer_handler 3.22% [kernel] [k] get_next_timer_interrupt 3.00% [kernel] [k] tcp_make_synack 2.77% [kernel] [k] ipt_do_table 2.70% [kernel] [k] run_timer_softirq 2.50% [kernel] [k] ip_finish_output 2.04% [kernel] [k] cascade Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:43 -07:00
Eric Dumazet	81b496b31a	tcp/dccp: shrink struct listen_sock We no longer use hash_rnd, nr_table_entries and syn_table[] For a listener with a backlog of 10 millions sockets, this saves 80 MBytes of vmalloced memory. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:42 -07:00
Eric Dumazet	079096f103	tcp/dccp: install syn_recv requests into ehash table In this patch, we insert request sockets into TCP/DCCP regular ehash table (where ESTABLISHED and TIMEWAIT sockets are) instead of using the per listener hash table. ACK packets find SYN_RECV pseudo sockets without having to find and lock the listener. In nominal conditions, this halves pressure on listener lock. Note that this will allow for SO_REUSEPORT refinements, so that we can select a listener using cpu/numa affinities instead of the prior 'consistent hash', since only SYN packets will apply this selection logic. We will shrink listen_sock in the following patch to ease code review. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ying Cai <ycai@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:41 -07:00
Eric Dumazet	2feda34192	tcp/dccp: remove inet_csk_reqsk_queue_added() timeout argument This is no longer used. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:40 -07:00
Eric Dumazet	aa3a0c8ce6	tcp: get_openreq[46]() changes When request sockets are no longer in a per listener hash table but on regular TCP ehash, we need to access listener uid through req->rsk_listener get_openreq6() also gets a const for its request socket argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:40 -07:00
Eric Dumazet	9cfd08601f	tcp: remove BUG_ON() in tcp_check_req() Once listener is lockless, its sk_state can change anytime. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:39 -07:00
Eric Dumazet	ba8e275a45	tcp: cleanup tcp_v[46]_inbound_md5_hash() We'll soon have to call tcp_v[46]_inbound_md5_hash() twice. Also add const attribute to the socket, as it might be the unlocked listener for SYN packets. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:38 -07:00
Eric Dumazet	38cb52455c	tcp: call sk_mark_napi_id() on the child, not the listener This fixes a typo : We want to store the NAPI id on child socket. Presumably nobody really uses busy polling, on short lived flows. Fixes: `3d97379a67` ("tcp: move sk_mark_napi_id() at the right place") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:37 -07:00
Eric Dumazet	8d2675f1e4	tcp: move synflood_warned into struct request_sock_queue long term plan is to remove struct listen_sock when its hash table is no longer there. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:37 -07:00
Eric Dumazet	aac065c50a	tcp: move qlen/young out of struct listen_sock qlen_inc & young_inc were protected by listener lock, while qlen_dec & young_dec were atomic fields. Everything needs to be atomic for upcoming lockless listener. Also move qlen/young in request_sock_queue as we'll get rid of struct listen_sock eventually. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:36 -07:00
Eric Dumazet	fff1f3001c	tcp: add a spinlock to protect struct request_sock_queue struct request_sock_queue fields are currently protected by the listener 'lock' (not a real spinlock) We need to add a private spinlock instead, so that softirq handlers creating children do not have to worry with backlog notion that the listener 'lock' carries. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:36 -07:00
Samuel Ortiz	21d19f87d4	NFC: nci: Use __nci_request for exported routines Since we do not know in which context drivers will call these routines, they should use the unlocked version of nci_request, i.e. __nci_request. It is up to drivers to know/decide if they need to take the req_lock mutex before calling those routines. When being called from the NCI setup routine there is no need to do so as ops->setup is called under req_lock. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-10-03 05:18:55 +02:00
Dan Carpenter	aa6555622c	nl802154: Missing return in nl802154_add_llsec_key() There was a missing return here so it meant that often ieee802154_llsec_parse_key_id() was not called. Fixes: `a26c5fd762` ('nl802154: add support for security layer') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-02 23:34:37 +02:00
Trond Myklebust	8dbb09570d	Merge tag 'nfs-rdma-for-4.3-2' of git://git.linux-nfs.org/projects/anna/nfs-rdma NFS: NFSoRDMA bugfix Fixes a use-after-free bug. Signed-off-by: Anna Schumaker <Anna.Schumaker@netapp.com>	2015-10-02 15:49:33 -04:00
David S. Miller	f6d3125fa3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/dsa/slave.c net/dsa/slave.c simply had overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-02 07:21:25 -07:00
Linus Torvalds	3deaa4f531	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix regression in SKB partial checksum handling, from Pravin B Shalar. 2) Fix VLAN inside of VXLAN handling in i40e driver, from Jesse Brandeburg. 3) Cure softlockups during accept() in SCTP, from Karl Heiss. 4) MSG_PEEK should return multiple SKBs worth of data in AF_UNIX, from Aaron Conole. 5) IPV6 erroneously ignores output interface specifier in lookup key for route lookups, fix from David Ahern. 6) In Marvell DSA driver, forward unknown frames to CPU port, from Andrew Lunn. 7) Mission flow flag initializations in some code paths, from David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: Initialize flow flags in input path net: dsa: fix preparation of a port STP update testptp: Silence compiler warnings on ppc64 net/mlx4: Handle return codes in mlx4_qp_attach_common dsa: mv88e6xxx: Enable forwarding for unknown to the CPU port skbuff: Fix skb checksum partial check. net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set net sysfs: Print link speed as signed integer bna: fix error handling af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag af_unix: Convert the unix_sk macro to an inline function for type safety net: sctp: Don't use 64 kilobyte lookup table for four elements l2tp: protect tunnel->del_work by ref_count net/ibm/emac: bump version numbers for correct work with ethtool sctp: Prevent soft lockup when sctp_accept() is called during a timeout event sctp: Whitespace fix i40e/i40evf: check for stopped admin queue i40e: fix VLAN inside VXLAN r8169: fix handling rtl_readphy result net: hisilicon: fix handling platform_get_irq result	2015-10-01 21:55:35 -04:00
Nikolay Aleksandrov	248234ca02	bridge: vlan: don't pass flags when creating context only We should not pass the original flags when creating a context vlan only because they may contain some flags that change behaviour in the bridge. The new global context should be with minimal set of flags, so pass 0 and let br_vlan_add() set the master flag only. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-01 18:24:05 -07:00
Nikolay Aleksandrov	263344e64c	bridge: vlan: fix possible null ptr derefs on port init and deinit When a new port is being added we need to make vlgrp available after rhashtable has been initialized and when removing a port we need to flush the vlans and free the resources after we're sure noone can use the port, i.e. after it's removed from the port list and synchronize_rcu is executed. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-01 18:24:05 -07:00
Nikolay Aleksandrov	77751ee8ae	bridge: vlan: move pvid inside net_bridge_vlan_group One obvious way to converge more code (which was also used by the previous vlan code) is to move pvid inside net_bridge_vlan_group. This allows us to simplify some and remove other port-specific functions. Also gives us the ability to simply pass the vlan group and use all of the contained information. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-01 18:24:04 -07:00
Nikolay Aleksandrov	468e794458	bridge: vlan: fix possible null vlgrp deref while registering new port While a new port is being initialized the rx_handler gets set, but the vlans get initialized later in br_add_if() and in that window if we receive a frame with a link-local address we can try to dereference p->vlgrp in: br_handle_frame() -> br_handle_local_finish() -> br_should_learn() Fix this by checking vlgrp before using it. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-01 18:24:04 -07:00
Nikolay Aleksandrov	8af78b6487	bridge: vlan: adjust rhashtable initial size and hash locks size As Stephen pointed out the default initial size is more than we need, so let's start small (4 elements, thus nelem_hint = 3). Also limit the hash locks to the number of CPUs as we don't need any write-side scaling and this looks like the minimum. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-01 18:24:03 -07:00
Linus Torvalds	46c8217c4a	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: - Fixes for mlx5 related issues - Fixes for ipoib multicast handling * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/ipoib: increase the max mcast backlog queue IB/ipoib: Make sendonly multicast joins create the mcast group IB/ipoib: Expire sendonly multicast joins IB/mlx5: Remove pa_lkey usages IB/mlx5: Remove support for IB_DEVICE_LOCAL_DMA_LKEY IB/iser: Add module parameter for always register memory xprtrdma: Replace global lkey with lkey local to PD	2015-10-01 16:38:52 -04:00
Eric W. Biederman	b1842ffddf	ipv6: Add missing newline to __xfrm6_output_finish Add a newline between variable declarations and the code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-10-01 11:43:29 -05:00
Alexander Aring	5f509239ec	ieee802154: handle datagram variables as u16 This reverts commit 9abc378c66e3d6f437eed77c1c534cbc183523f7 ("ieee802154: 6lowpan: change datagram var types"). The reason is that I forgot the IPv6 fragmentation here. Our MTU of lowpan interface is 1280 and skb->len should not above of that. If we reach a payload above 1280 in IPv6 header then we have a IPv6 fragmentation above 802.15.4 6LoWPAN fragmentation. The type "u16" was fine, instead I added now a WARN_ON_ONCE if skb->len is above MTU which should never happen otherwise IPv6 on minimum MTU size is broken. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-01 13:38:22 +02:00
Eric W. Biederman	b59f2e31b8	ipvs: Don't protect ip_vs_addr_is_unicast with CONFIG_SYSCTL I arranged the code so that the compiler can remove the unecessary bits in ip_vs_leave when CONFIG_SYSCTL is unset, and removed an explicit CONFIG_SYSCTL. Unfortunately when rebasing my work on top of that of Alex Gartrell I missed the fact that the newly added function ip_vs_addr_is_unicast was surrounded by CONFIG_SYSCTL. So remove the now unnecessary CONFIG_SYSCTL guards around ip_vs_addr_is_unicast. It is causing build failures today when CONFIG_SYSCTL is not selected and any self respecting compiler will notice that sysctl_cache_bypass is always false without CONFIG_SYSCTL and not include the logic from the function ip_vs_addr_is_unicast in the compiled code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-10-01 15:05:15 +09:00
Pablo Neira Ayuso	6ece90f9a1	netfilter: fix Kconfig dependencies for nf_dup_ipv{4,6} net/built-in.o: In function `nf_dup_ipv4': (.text+0xed24d): undefined reference to `nf_conntrack_untracked' net/built-in.o: In function `nf_dup_ipv4': (.text+0xed267): undefined reference to `nf_conntrack_untracked' net/built-in.o: In function `nf_dup_ipv6': (.text+0x158aef): undefined reference to `nf_conntrack_untracked' net/built-in.o: In function `nf_dup_ipv6': (.text+0x158b09): undefined reference to `nf_conntrack_untracked' Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-01 00:19:54 +02:00
Santosh Shilimkar	9b9acde7e8	RDS: Use per-bucket rw lock for bind hash-table One global lock protecting hash-tables with 1024 buckets isn't efficient and it shows up in a massive systems with truck loads of RDS sockets serving multiple databases. The perf data clearly highlights the contention on the rw lock in these massive workloads. When the contention gets worse, the code gets into a state where it decides to back off on the lock. So while it has disabled interrupts, it sits and backs off on this lock get. This causes the system to become sluggish and eventually all sorts of bad things happen. The simple fix is to move the lock into the hash bucket and use per-bucket lock to improve the scalability. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>	2015-09-30 12:43:25 -04:00
Santosh Shilimkar	2812695988	RDS: fix rds_sock reference bug while doing bind One need to take rds socket reference while using it and release it once done with it. rds_add_bind() code path does not do that so lets fix it. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>	2015-09-30 12:43:25 -04:00
Santosh Shilimkar	8b0a6b461e	RDS: make socket bind/release locking scheme simple and more efficient RDS bind and release locking scheme is very inefficient. It uses RCU for maintaining the bind hash-table which is great but it also needs to hold spinlock for [add/remove]_bound(). So overall usecase, the hash-table concurrent speedup doesn't pay off. In fact blocking nature of synchronize_rcu() makes the RDS socket shutdown too slow which hurts RDS performance since connection shutdown and re-connect happens quite often to maintain the RC part of the protocol. So we make the locking scheme simpler and more efficient by replacing spin_locks with reader/writer locks and getting rid off rcu for bind hash-table. In subsequent patch, we also covert the global lock with per-bucket lock to reduce the global lock contention. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>	2015-09-30 12:43:24 -04:00
Santosh Shilimkar	59fe460674	RDS: use kfree_rcu in rds_ib_remove_ipaddr synchronize_rcu() slowing down un-necessarily the socket shutdown path. It is used just kfree() the ip addresses in rds_ib_remove_ipaddr() which is perfect usecase for kfree_rcu(); So lets use that to gain some speedup. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>	2015-09-30 12:43:24 -04:00
Alexander Aring	1c64f147d3	ieee802154: 6lowpan: add tx/rx stats This patch adds support for increment transmit and receive stats. The meaning of these stats are IPv6 based, which shows the stats after running the 6lowpan adaptation layer (uncompression/compression, fragmentation handling) on receive and before the adaptation layer when transmit. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-30 13:23:57 +02:00
Alexander Aring	4bc8fbc95e	ieee802154: 6lowpan: don't skip first dsn while fragmentation This patch fixes the data frame sequence numer (dsn) while 6lowpan fragmentation for frag1. Currently we create one 802.15.4 header at first, then check if it's match into one frame and at the end construct many fragments and calling wpan_dev_hard_header for each of them, inclusive for the first fragment. This will make the first generated header to garbage, instead we copying this header for frag1 instead of generate a new one which skips one dsn. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-30 13:23:57 +02:00
Alexander Aring	72d53b1162	ieee802154: 6lowpan: change datagram var types This patch changes datagram size variable from u16 type to unsigned int. The reason is that an IPv6 header has an MAX_UIN16 payload length, but the datagram size is payload + IPv6 header length. This avoids overflows at some places. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-30 13:23:57 +02:00
Alexander Aring	b40988c438	ieee802154: change mtu size behaviour This patch changes the mtu size of 802.15.4 interfaces. The current setting is the meaning of the maximum transport unit with mac header, which is 127 bytes according 802.15.4. The linux meaning of the mtu size field is the maximum payload of a mac frame. Like in ethernet, which is 1500 bytes. We have dynamic length of mac frames in 802.15.4, this is why we assume the minimum header length which is hard_header_len. This contains fc and sequence fields. These can evaluated by driver layer without additional checks. We currently don't support to set the FCS from userspace, so we need to subtract this from mtu size as well. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-30 13:21:32 +02:00
Alexander Aring	d58a2fa903	mac802154: add comments for llsec issues While doing a little test with the llsec implementation I saw these issues. We should move decryption and encruption somewhere else, otherwise while capturing with wireshark the mac header shows secuirty fields but the payload is plaintext. A complete other issue is what doing with HardMAC drivers where the payload is always plaintext. I think we need a special handling then in userspace. We currently doesn't support any HardMAC transceivers, so we should fix the first issue for SoftMAC transceivers. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-30 13:16:44 +02:00

... 105 106 107 108 109 ...

44719 Commits