android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Marek Lindner	0abf5d8117	batman-adv: mark debug_log struct as bat_priv only struct Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-01-19 21:18:10 +08:00
Marek Lindner	b6d0ab7ca3	batman-adv: align kernel doc properly Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2013-01-19 21:18:09 +08:00
Antonio Quartulli	7241444209	batman-adv: a delayed_work has to be initialised once A delayed_work struct does not need to be initialized each every time before being enqueued. Therefore the INIT_DELAYED_WORK() macro should be used during the initialization process only. Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>	2013-01-19 21:18:09 +08:00
Joe Millenbach	4f73bc4dd3	tty: Added a CONFIG_TTY option to allow removal of TTY The option allows you to remove TTY and compile without errors. This saves space on systems that won't support TTY interfaces anyway. bloat-o-meter output is below. The bulk of this patch consists of Kconfig changes adding "depends on TTY" to various serial devices and similar drivers that require the TTY layer. Ideally, these dependencies would occur on a common intermediate symbol such as SERIO, but most drivers "select SERIO" rather than "depends on SERIO", and "select" does not respect dependencies. bloat-o-meter output comparing our previous minimal to new minimal by removing TTY. The list is filtered to not show removed entries with awk '$3 != "-"' as the list was very long. add/remove: 0/226 grow/shrink: 2/14 up/down: 6/-35356 (-35350) function old new delta chr_dev_init 166 170 +4 allow_signal 80 82 +2 static.__warned 143 142 -1 disallow_signal 63 62 -1 __set_special_pids 95 94 -1 unregister_console 126 121 -5 start_kernel 546 541 -5 register_console 593 588 -5 copy_from_user 45 40 -5 sys_setsid 128 120 -8 sys_vhangup 32 19 -13 do_exit 1543 1526 -17 bitmap_zero 60 40 -20 arch_local_irq_save 137 117 -20 release_task 674 652 -22 static.spin_unlock_irqrestore 308 260 -48 Signed-off-by: Joe Millenbach <jmillenbach@gmail.com> Reviewed-by: Jamey Sharp <jamey@minilop.net> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-18 16:15:27 -08:00
Johannes Berg	a65240c101	mac80211: allow drivers to access IPv6 information To be able to implement NS response offloading (in regular operation or while in WoWLAN) drivers need to know the IPv6 addresses assigned to interfaces. Implement an IPv6 notifier in mac80211 to call the driver when addresses change. Unlike for IPv4, implement it as a callback rather than as a list in the BSS configuration, that is more flexible. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-01-18 21:55:38 +01:00
Johannes Berg	0a214d3f7e	mac80211: improve aggregation debug messages A lot of the aggregation messages don't indicate the station so they're hard to understand if there are multiple sessions in progress. Make that easier by adding the MAC address to most messages. Also add the TID if it wasn't already there. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-01-18 21:55:15 +01:00
Johannes Berg	0f19b41e22	mac80211: remove ARP filter enable/disable logic Depending on the driver, having ARP filtering for some addresses may be possible. Remove the logic that tracks whether ARP filter is enabled or not and give the driver the total number of addresses instead of the length of the list so it can make its own decision. Reviewed-by: Luciano Coelho <coelho@ti.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-01-18 21:20:34 +01:00
Hannes Frederic Sowa	1ad759d847	ipv6: remove unneeded check to pskb_may_pull in ipip6_rcv This is already checked by the caller (tunnel64_rcv) and brings ipip6_rcv in line with ipip_rcv. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-18 14:43:51 -05:00
YOSHIFUJI Hideaki / 吉藤英明	115b0aa6b4	ndisc: Check NS message length before access. Check message length before accessing "target" field, as we do for other types. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-18 14:41:13 -05:00
YOSHIFUJI Hideaki / 吉藤英明	12fd84f438	ipv6: Remove unused neigh argument for icmp6_dst_alloc() and its callers. Because of rt->n removal, we do not need neigh argument any more. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-18 14:41:13 -05:00
Steffen Klassert	6f809da27c	ipv6: Add an error handler for icmp6 pmtu and redirect events are now handled in the protocols error handler, so add an error handler for icmp6 to do this. It is needed in the case when we have no socket context. Based on a patch by Duan Jiong. Reported-by: Duan Jiong <djduanjiong@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-18 14:19:42 -05:00
Alan Ott	ee21c7e0d1	6lowpan: Handle uncompressed IPv6 packets over 6LoWPAN Handle the reception of uncompressed packets (dispatch type = IPv6). Signed-off-by: Alan Ott <alan@signal11.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-18 14:18:30 -05:00
Alan Ott	0c446212c4	6lowpan: Refactor packet delivery into a function Refactor the handing of the skb's to the individual lowpan devices into a function. Signed-off-by: Alan Ott <alan@signal11.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-18 14:18:30 -05:00
Yoni Divinsky	de5fad8157	mac80211: add op to configure default key id There are hardwares which support offload of data packets for example when auto ARP is enabled the hw will send the ARP response. In such cases if WEP encryption is configured the hw must know the default WEP key in order to encrypt the packets correctly. When hw_accel is enabled and encryption type is set to WEP, the driver should get the default key index from mac80211. Signed-off-by: Yoni Divinsky <yoni.divinsky@ti.com> [cleanups, fixes, documentation] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-01-18 13:30:21 +01:00
Nickolai Zeldovich	e2f6725917	net/xfrm/xfrm_replay: avoid division by zero All of the xfrm_replay->advance functions in xfrm_replay.c check if x->replay_esn->replay_window is zero (and return if so). However, one of them, xfrm_replay_advance_bmp(), divides by that value (in the '%' operator) before doing the check, which can potentially trigger a divide-by-zero exception. Some compilers will also assume that the earlier division means the value cannot be zero later, and thus will eliminate the subsequent zero check as dead code. This patch moves the division to after the check. Signed-off-by: Nickolai Zeldovich <nickolai@csail.mit.edu> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-01-18 06:19:49 +01:00
Johan Hedberg	46818ed514	Bluetooth: Fix using system-global workqueue when not necessary There's a per-HCI device workqueue (hdev->workqueue) that should be used for general per-HCI device work (except hdev->req_workqueue that's for hci_request() related work). This patch fixes places using the system-global work queue and makes them use the hdev->workqueue instead. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-01-18 02:58:37 -02:00
Johan Hedberg	1920257316	Bluetooth: Use req_workqueue for hci_request operations This patch converts work assignment relying on hci_request() from the system-global work queue to the per-HCI device specific work queue (hdev->req_workqueue) intended for hci_request() related tasks. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-01-18 02:56:20 -02:00
Johan Hedberg	6ead1bbc38	Bluetooth: Add a new workqueue for hci_request operations The hci_request function is blocking and cannot be called through the usual per-HCI device workqueue (hdev->workqueue). While hci_request is in progress any other work from the queue, including sending HCI commands to the controller would be blocked and eventually cause the hci_request call to time out. This patch adds a second workqueue to be used by operations needing hci_request and thereby avoiding issues with blocking other workqueue users. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-01-18 02:54:21 -02:00
Greg Kroah-Hartman	ed408f7c0f	Merge 3.9-rc4 into driver-core-next This is to fix up a build problem with a wireless driver due to the dynamic-debug patches in this branch. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-17 19:48:18 -08:00
Neil Horman	2f94aabd9f	sctp: refactor sctp_outq_teardown to insure proper re-initalization Jamie Parsons reported a problem recently, in which the re-initalization of an association (The duplicate init case), resulted in a loss of receive window space. He tracked down the root cause to sctp_outq_teardown, which discarded all the data on an outq during a re-initalization of the corresponding association, but never reset the outq->outstanding_data field to zero. I wrote, and he tested this fix, which does a proper full re-initalization of the outq, fixing this problem, and hopefully future proofing us from simmilar issues down the road. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Jamie Parsons <Jamie.Parsons@metaswitch.com> Tested-by: Jamie Parsons <Jamie.Parsons@metaswitch.com> CC: Jamie Parsons <Jamie.Parsons@metaswitch.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: netdev@vger.kernel.org Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:39:56 -05:00
YOSHIFUJI Hideaki / 吉藤英明	887c95cc1d	ipv6: Complete neighbour entry removal from dst_entry. CC: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明	6fd6ce2056	ipv6: Do not depend on rt->n in ip6_finish_output2(). If neigh is not found, create new one. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明	707be1ff3d	ipv6: Do not depend on rt->n in ip6_dst_lookup_tail(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明	2152caea71	ipv6: Do not depend on rt->n in rt6_probe(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明	145a36217a	ipv6: Do not depend on rt->n in rt6_check_neigh(). CC: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明	c440f1609b	ipv6: Do not depend on rt->n in ip6_pol_route(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:19 -05:00
YOSHIFUJI Hideaki / 吉藤英明	dd0cbf29b1	ipv6 route: Dump gateway based on RTF_GATEWAY flag and rt->rt6i_gateway. Do not depend on rt->n. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:18 -05:00
YOSHIFUJI Hideaki / 吉藤英明	8e022ee63f	ndisc: Remove tbl argument for __ipv6_neigh_lookup(). We can refer to nd_tbl directly. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:18 -05:00
YOSHIFUJI Hideaki / 吉藤英明	7ff74a596b	ndisc: Update neigh->updated with write lock. neigh->nud_state and neigh->updated are under protection of neigh->lock. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 18:38:18 -05:00
stephen hemminger	5a406b0cdf	netfilter: nf_ct_snmp: add include file Prototype for nf_nat_snmp_hook is in nf_conntrack_snmp.h therefore it should be included to get type checking. Found by sparse. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-01-18 00:28:18 +01:00
Florian Westphal	9b21f6a909	netfilter: ctnetlink: allow userspace to modify labels Add the ability to set/clear labels assigned to a conntrack via ctnetlink. To allow userspace to only alter specific bits, Pablo suggested to add a new CTA_LABELS_MASK attribute: The new set of active labels is then determined via active = (active & ~mask) ^ changeset i.e., the mask selects those bits in the existing set that should be changed. This follows the same method already used by MARK and CONNMARK targets. Omitting CTA_LABELS_MASK is the same as setting all bits in CTA_LABELS_MASK to 1: The existing set is replaced by the one from userspace. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-01-18 00:28:17 +01:00
Florian Westphal	0ceabd8387	netfilter: ctnetlink: deliver labels to userspace Introduce CTA_LABELS attribute to send a bit-vector of currently active labels to userspace. Future patch will permit userspace to also set/delete active labels. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-01-18 00:28:16 +01:00
Florian Westphal	c539f01717	netfilter: add connlabel conntrack extension similar to connmarks, except labels are bit-based; i.e. all labels may be attached to a flow at the same time. Up to 128 labels are supported. Supporting more labels is possible, but requires increasing the ct offset delta from u8 to u16 type due to increased extension sizes. Mapping of bit-identifier to label name is done in userspace. The extension is enabled at run-time once "-m connlabel" netfilter rules are added. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-01-18 00:28:15 +01:00
Alex Elder	ae7ca4a35b	libceph: pass num_op with ops Both ceph_osdc_alloc_request() and ceph_osdc_build_request() are provided an array of ceph osd request operations. Rather than just passing the number of operations in the array, the caller is required append an additional zeroed operation structure to signal the end of the array. All callers know the number of operations at the time these functions are called, so drop the silly zero entry and supply that number directly. As a result, get_num_ops() is no longer needed. This also means that ceph_osdc_alloc_request() never uses its ops argument, so that can be dropped. Also rbd_create_rw_ops() no longer needs to add one to reserve room for the additional op. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:57 -06:00
Alex Elder	54a5400721	libceph: don't set pages or bio in ceph_osdc_alloc_request() Only one of the two callers of ceph_osdc_alloc_request() provides page or bio data for its payload. And essentially all that function was doing with those arguments was assigning them to fields in the osd request structure. Simplify ceph_osdc_alloc_request() by having the caller take care of making those assignments Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:57 -06:00
Alex Elder	d178a9e740	libceph: don't set flags in ceph_osdc_alloc_request() The only thing ceph_osdc_alloc_request() really does with the flags value it is passed is assign it to the newly-created osd request structure. Do that in the caller instead. Both callers subsequently call ceph_osdc_build_request(), so have that function (instead of ceph_osdc_alloc_request()) issue a warning if a request comes through with neither the read nor write flags set. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:05 -06:00
Alex Elder	e75b45cf36	libceph: drop osdc from ceph_calc_raw_layout() The osdc parameter to ceph_calc_raw_layout() is not used, so get rid of it. Consequently, the corresponding parameter in calc_layout() becomes unused, so get rid of that as well. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:05 -06:00
Alex Elder	4d6b250bf1	libceph: drop snapid in ceph_calc_raw_layout() A snapshot id must be provided to ceph_calc_raw_layout() even though it is not needed at all for calculating the layout. Where the snapshot id is needed is when building the request message for an osd operation. Drop the snapid parameter from ceph_calc_raw_layout() and pass that value instead in ceph_osdc_build_request(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:05 -06:00
Alex Elder	e8afad656c	libceph: pass length to ceph_calc_file_object_mapping() ceph_calc_file_object_mapping() takes (among other things) a "file" offset and length, and based on the layout, determines the object number ("bno") backing the affected portion of the file's data and the offset into that object where the desired range begins. It also computes the size that should be used for the request--either the amount requested or something less if that would exceed the end of the object. This patch changes the input length parameter in this function so it is used only for input. That is, the argument will be passed by value rather than by address, so the value provided won't get updated by the function. The value would only get updated if the length would surpass the current object, and in that case the value it got updated to would be exactly that returned in *oxlen. Only one of the two callers is affected by this change. Update ceph_calc_raw_layout() so it records any updated value. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:04 -06:00
Alex Elder	0120be3c60	libceph: pass length to ceph_osdc_build_request() The len argument to ceph_osdc_build_request() is set up to be passed by address, but that function never updates its value so there's no need to do this. Tighten up the interface by passing the length directly. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:04 -06:00
Alex Elder	5b9d1b1cd4	libceph: kill op_needs_trail() Since every osd message is now prepared to include trailing data, there's no need to check ahead of time whether any operations will make use of the trail portion of the message. We can drop the second argument to get_num_ops(), and as a result we can also get rid of op_needs_trail() which is no longer used. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:03 -06:00
Alex Elder	c885837f7d	libceph: always allow trail in osd request An osd request structure contains an optional trail portion, which if present will contain data to be passed in the payload portion of the message containing the request. The trail field is a ceph_pagelist pointer, and if null it indicates there is no trail. A ceph_pagelist structure contains a length field, and it can legitimately hold value 0. Make use of this to change the interpretation of the "trail" of an osd request so that every osd request has trailing data, it just might have length 0. This means we change the r_trail field in a ceph_osd_request structure from a pointer to a structure that is always initialized. Note that in ceph_osdc_start_request(), the trail pointer (or now address of that structure) is assigned to a ceph message's trail field. Here's why that's still OK (looking at net/ceph/messenger.c): - What would have resulted in a null pointer previously will now refer to a 0-length page list. That message trail pointer is used in two functions, write_partial_msg_pages() and out_msg_pos_next(). - In write_partial_msg_pages(), a null page list pointer is handled the same as a message with 0-length trail, and both result in a "in_trail" variable set to false. The trail pointer is only used if in_trail is true. - The only other place the message trail pointer is used is out_msg_pos_next(). That function is only called by write_partial_msg_pages() and only touches the trail pointer if the in_trail value it is passed is true. Therefore a null ceph_msg->trail pointer is equivalent to a non-null pointer referring to a 0-length page list structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:03 -06:00
Alex Elder	af77f26caa	rbd: drop oid parameters from ceph_osdc_build_request() The last two parameters to ceph_osd_build_request() describe the object id, but the values passed always come from the osd request structure whose address is also provided. Get rid of those last two parameters. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:01 -06:00
Kevin Cernekee	7266507d89	netfilter: nf_ct_sip: support Cisco 7941/7945 IP phones Most SIP devices use a source port of 5060/udp on SIP requests, so the response automatically comes back to port 5060: phone_ip:5060 -> proxy_ip:5060 REGISTER proxy_ip:5060 -> phone_ip:5060 100 Trying The newer Cisco IP phones, however, use a randomly chosen high source port for the SIP request but expect the response on port 5060: phone_ip:49173 -> proxy_ip:5060 REGISTER proxy_ip:5060 -> phone_ip:5060 100 Trying Standard Linux NAT, with or without nf_nat_sip, will send the reply back to port 49173, not 5060: phone_ip:49173 -> proxy_ip:5060 REGISTER proxy_ip:5060 -> phone_ip:49173 100 Trying But the phone is not listening on 49173, so it will never see the reply. This patch modifies nf_*_sip to work around this quirk by extracting the SIP response port from the Via: header, iff the source IP in the packet header matches the source IP in the SIP request. Signed-off-by: Kevin Cernekee <cernekee@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-01-17 21:12:44 +01:00
Alex Elder	c3acb18196	libceph: reformat __reset_osd() Reformat __reset_osd() into three distinct blocks of code handling the three return cases. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 14:07:44 -06:00
Jesper Dangaard Brouer	c2a936600f	net: increase fragment memory usage limits Increase the amount of memory usage limits for incomplete IP fragments. Arguing for new thresh high/low values: High threshold = 4 MBytes Low threshold = 3 MBytes The fragmentation memory accounting code, tries to account for the real memory usage, by measuring both the size of frag queue struct (inet_frag_queue (ipv4:ipq/ipv6:frag_queue)) and the SKB's truesize. We want to be able to handle/hold-on-to enough fragments, to ensure good performance, without causing incomplete fragments to hurt scalability, by causing the number of inet_frag_queue to grow too much (resulting longer searches for frag queues). For IPv4, how much memory does the largest frag consume. Maximum size fragment is 64K, which is approx 44 fragments with MTU(1500) sized packets. Sizeof(struct ipq) is 200. A 1500 byte packet results in a truesize of 2944 (not 2048 as I first assumed) (442944)+200 = 129736 bytes The current default high thresh of 262144 bytes, is obviously problematic, as only two 64K fragments can fit in the queue at the same time. How many 64K fragment can we fit into 4 MBytes: 42^20/((442944)+200) = 32.34 fragment in queues An attacker could send a separate/distinct fake fragment packets per queue, causing us to allocate one inet_frag_queue per packet, and thus attacking the hash table and its lists. How many frag queue do we need to store, and given a current hash size of 64, what is the average list length. Using one MTU sized fragment per inet_frag_queue, each consuming (2944+200) 3144 bytes. 42^20/(2944+200) = 1334 frag queues -> 21 avg list length An attack could send small fragments, the smallest packet I could send resulted in a truesize of 896 bytes (I'm a little surprised by this). 4*2^20/(896+200) = 3827 frag queues -> 59 avg list length When increasing these number, we also need to followup with improvements, that is going to help scalability. Simply increasing the hash size, is not enough as the current implementation does not have a per hash bucket locking. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 14:29:53 -05:00
Sage Weil	7d7c1f6136	crush: avoid recursion if we have already collided This saves us some cycles, but does not affect the placement result at all. This corresponds to ceph.git commit 4abb53d4f. Signed-off-by: Sage Weil <sage@inktank.com>	2013-01-17 12:42:39 -06:00
Jim Schutt	1604f488ac	libceph: for chooseleaf rules, retry CRUSH map descent from root if leaf is failed Add libceph support for a new CRUSH tunable recently added to Ceph servers. Consider the CRUSH rule step chooseleaf firstn 0 type <node_type> This rule means that <n> replicas will be chosen in a manner such that each chosen leaf's branch will contain a unique instance of <node_type>. When an object is re-replicated after a leaf failure, if the CRUSH map uses a chooseleaf rule the remapped replica ends up under the <node_type> bucket that held the failed leaf. This causes uneven data distribution across the storage cluster, to the point that when all the leaves but one fail under a particular <node_type> bucket, that remaining leaf holds all the data from its failed peers. This behavior also limits the number of peers that can participate in the re-replication of the data held by the failed leaf, which increases the time required to re-replicate after a failure. For a chooseleaf CRUSH rule, the tree descent has two steps: call them the inner and outer descents. If the tree descent down to <node_type> is the outer descent, and the descent from <node_type> down to a leaf is the inner descent, the issue is that a down leaf is detected on the inner descent, so only the inner descent is retried. In order to disperse re-replicated data as widely as possible across a storage cluster after a failure, we want to retry the outer descent. So, fix up crush_choose() to allow the inner descent to return immediately on choosing a failed leaf. Wire this up as a new CRUSH tunable. Note that after this change, for a chooseleaf rule, if the primary OSD in a placement group has failed, choosing a replacement may result in one of the other OSDs in the PG colliding with the new primary. This requires that OSD's data for that PG to need moving as well. This seems unavoidable but should be relatively rare. This corresponds to ceph.git commit 88f218181a9e6d2292e2697fc93797d0f6d6e5dc. Signed-off-by: Jim Schutt <jaschut@sandia.gov> Reviewed-by: Sage Weil <sage@inktank.com>	2013-01-17 12:42:39 -06:00
Yan, Zheng	a41bad1a9b	ceph: re-calculate truncate_size for strip object Otherwise osd may truncate the object to larger size. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>	2013-01-17 12:42:37 -06:00
John W. Linville	811477de5f	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2013-01-17 12:07:44 -05:00

... 12 13 14 15 16 ...

27150 Commits