tcp: attach SYNACK messages to request sockets instead of listener
If a listen backlog is very big (to avoid syncookies), then the listener sk->sk_wmem_alloc is the main source of false sharing, as we need to touch it twice per SYNACK re-transmit and TX completion. (One SYN packet takes listener lock once, but up to 6 SYNACK are generated) By attaching the skb to the request socket, we remove this source of contention. Tested: listen(fd, 10485760); // single listener (no SO_REUSEPORT) 16 RX/TX queue NIC Sustain a SYNFLOOD attack of ~320,000 SYN per second, Sending ~1,400,000 SYNACK per second. Perf profiles now show listener spinlock being next bottleneck. 20.29% [kernel] [k] queued_spin_lock_slowpath 10.06% [kernel] [k] __inet_lookup_established 5.12% [kernel] [k] reqsk_timer_handler 3.22% [kernel] [k] get_next_timer_interrupt 3.00% [kernel] [k] tcp_make_synack 2.77% [kernel] [k] ipt_do_table 2.70% [kernel] [k] run_timer_softirq 2.50% [kernel] [k] ip_finish_output 2.04% [kernel] [k] cascade Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:

committed by
David S. Miller

parent
1b33bc3e9e
commit
ca6fb06518
@@ -224,13 +224,15 @@ static struct fq_flow *fq_classify(struct sk_buff *skb, struct fq_sched_data *q)
|
||||
if (unlikely((skb->priority & TC_PRIO_MAX) == TC_PRIO_CONTROL))
|
||||
return &q->internal;
|
||||
|
||||
/* SYNACK messages are attached to a listener socket.
|
||||
* 1) They are not part of a 'flow' yet
|
||||
* 2) We do not want to rate limit them (eg SYNFLOOD attack),
|
||||
/* SYNACK messages are attached to a TCP_NEW_SYN_RECV request socket
|
||||
* 1) request sockets are not full blown,
|
||||
* they do not contain sk_pacing_rate
|
||||
* 2) They are not part of a 'flow' yet
|
||||
* 3) We do not want to rate limit them (eg SYNFLOOD attack),
|
||||
* especially if the listener set SO_MAX_PACING_RATE
|
||||
* 3) We pretend they are orphaned
|
||||
* 4) We pretend they are orphaned
|
||||
*/
|
||||
if (!sk || sk->sk_state == TCP_LISTEN) {
|
||||
if (!sk || sk->sk_state == TCP_NEW_SYN_RECV) {
|
||||
unsigned long hash = skb_get_hash(skb) & q->orphan_mask;
|
||||
|
||||
/* By forcing low order bit to 1, we make sure to not
|
||||
|
Reference in New Issue
Block a user