Commit Graph

34799 Commits

Author SHA1 Message Date
Tom Herbert
23461551c0 fou: Support for foo-over-udp RX path
This patch provides a receive path for foo-over-udp. This allows
direct encapsulation of IP protocols over UDP. The bound destination
port is used to map to an IP protocol, and the XFRM framework
(udp_encap_rcv) is used to receive encapsulated packets. Upon
reception, the encapsulation header is logically removed (pointer
to transport header is advanced) and the packet is reinjected into
the receive path with the IP protocol indicated by the mapping.

Netlink is used to configure FOU ports. The configuration information
includes the port number to bind to and the IP protocol corresponding
to that port.

This should support GRE/UDP
(http://tools.ietf.org/html/draft-yong-tsvwg-gre-in-udp-encap-02),
as will as the other IP tunneling protocols (IPIP, SIT).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:31 -04:00
Tom Herbert
ce3e02867e net: Export inet_offloads and inet6_offloads
Want to be able to use these in foo-over-udp offloads, etc.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:31 -04:00
John Fastabend
4e2840eee6 net: sched: cls_u32: rcu can not be last node
tc_u32_sel 'sel' in tc_u_knode expects to be the last element in the
structure and pads the structure with tc_u32_key fields for each key.

 kzalloc(sizeof(*n) + s->nkeys*sizeof(struct tc_u32_key), GFP_KERNEL)

CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:05:45 -04:00
David S. Miller
8f665f6cb7 Merge tag 'master-2014-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-09-17

Please pull this batch of fixes intended for the 3.17 stream...

Arend van Spriel sends a trio of minor brcmfmac fixes, including a
fix for a Kconfig/build issue, a fix for a crash (null reference),
and a regression fix related to event handling on a P2P interface.

Hante Meuleman follows-up with a brcmfmac fix for a memory leak.

Johannes Stezenbach brings an ath9k_htc fix for a regression related
to hardware decryption offload.

Marcel Holtmann delivers a one-liner to properly mark a device ID
table in rfkill-gpio.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:33:15 -04:00
Eric Dumazet
ab34f64808 net: sched: use __skb_queue_head_init() where applicable
pfifo_fast and htb use skb lists, without needing their spinlocks.
(They instead use the standard qdisc lock)

We can use __skb_queue_head_init() instead of skb_queue_head_init()
to be consistent.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:32:10 -04:00
Florian Fainelli
6819563e64 net: dsa: allow switch drivers to specify phy_device::dev_flags
Some switch drivers (e.g: bcm_sf2) may have to communicate specific
workarounds or flags towards the PHY device driver. Allow switches
driver to be delegated that task by introducing a get_phy_flags()
callback which will do just that.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:27:07 -04:00
Eric Dumazet
2e4e441071 net: add alloc_skb_with_frags() helper
Extract from sock_alloc_send_pskb() code building skb with frags,
so that we can reuse this in other contexts.

Intent is to use it from tcp_send_rcvq(), tcp_collapse(), ...

We also want to replace some skb_linearize() calls to a more reliable
strategy in pathological cases where we need to reduce number of frags.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:25:23 -04:00
Eric Dumazet
cb93471acc tcp: do not fake tcp headers in tcp_send_rcvq()
Now we no longer rely on having tcp headers for skbs in receive queue,
tcp repair do not need to build fake ones.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:04:13 -04:00
Andy Zhou
c8fffcea0a l2tp: Refactor l2tp core driver to make use of the common UDP tunnel functions
Simplify l2tp implementation using common UDP tunnel APIs.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:57:15 -04:00
Andy Zhou
6a93cc9052 udp-tunnel: Add a few more UDP tunnel APIs
Added a few more UDP tunnel APIs that can be shared by UDP based
tunnel protocol implementation. The main ones are highlighted below.

setup_udp_tunnel_sock() configures UDP listener socket for
receiving UDP encapsulated packets.

udp_tunnel_xmit_skb() and upd_tunnel6_xmit_skb() transmit skb
using UDP encapsulation.

udp_tunnel_sock_release() closes the UDP tunnel listener socket.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:57:15 -04:00
Andy Zhou
fd384412e1 udp_tunnel: Seperate ipv6 functions into its own file.
Add ip6_udp_tunnel.c for ipv6 UDP tunnel functions to avoid ifdefs
in udp_tunnel.c

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:57:15 -04:00
Pablo Neira Ayuso
84d7fce693 netfilter: nf_tables: export rule-set generation ID
This patch exposes the ruleset generation ID in three ways:

1) The new command NFT_MSG_GETGEN that exposes the 32-bits ruleset
   generation ID. This ID is incremented in every commit and it
   should be large enough to avoid wraparound problems.

2) The less significant 16-bits of the generation ID are exposed through
   the nfgenmsg->res_id header field. This allows us to quickly catch
   if the ruleset has change between two consecutive list dumps from
   different object lists (in this specific case I think the risk of
   wraparound is unlikely).

3) Userspace subscribers may receive notifications of new rule-set
   generation after every commit. This also provides an alternative
   way to monitor the generation ID. If the events are lost, the
   userspace process hits a overrun error, so it knows that it is
   working with a stale ruleset anyway.

Patrick spotted that rule-set transformations in userspace may take
quite some time. In that case, it annotates the 32-bits generation ID
before fetching the rule-set, then:

1) it compares it to what we obtain after the transformation to
   make sure it is not working with a stale rule-set and no wraparound
   has ocurred.

2) it subscribes to ruleset notifications, so it can watch for new
   generation ID.

This is complementary to the NLM_F_DUMP_INTR approach, which allows
us to detect an interference in the middle one single list dumping.
There is no way to explicitly check that an interference has occurred
between two list dumps from the kernel, since it doesn't know how
many lists the userspace client is actually going to dump.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-19 11:14:43 +02:00
Pablo Neira Ayuso
fc04733a1a netfilter: nfnetlink: use original skbuff when committing/aborting
This allows us to access the original content of the batch from
the commit and the abort paths.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-19 11:14:42 +02:00
Johan Hedberg
5eb596f55c Bluetooth: Fix setting correct security level when initiating SMP
We can only determine the final security level when both pairing request
and response have been exchanged. When initiating pairing the starting
target security level is set to MEDIUM unless explicitly specified to be
HIGH, so that we can still perform pairing even if the remote doesn't
have MITM capabilities. However, once we've received the pairing
response we should re-consult the remote and local IO capabilities and
upgrade the target security level if necessary.

Without this patch the resulting Long Term Key will occasionally be
reported to be unauthenticated when it in reality is an authenticated
one.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
2014-09-18 17:39:37 +02:00
Pablo Neira Ayuso
fcfa8f493f Merge branch 'ipvs-next'
Simon Horman says:

====================
This pull requests makes the following changes:

* Add simple weighted fail-over scheduler.
  - Unlike other IPVS schedulers this offers fail-over rather than load
    balancing. Connections are directed to the appropriate server based
    solely on highest weight value and server availability.
  - Thanks to Kenny Mathis

* Support IPv6 real servers in IPv4 virtual-services and vice versa
  - This feature is supported in conjunction with the tunnel (IPIP)
    forwarding mechanism. That is, IPv4 may be forwarded in IPv6 and
    vice versa.
  - The motivation for this is to allow more flexibility in the
    choice of IP version offered by both virtual-servers and
    real-servers as they no longer need to match: An IPv4 connection from an
    end-user may be forwarded to a real-server using IPv6 and vice versa.
  - Further work need to be done to support this feature in conjunction
    with connection synchronisation. For now such configurations are
    not allowed.
  - This change includes update to netlink protocol, adding a new
    destination address family attribute. And the necessary changes
    to plumb this information throughout IPVS.
  - Thanks to Alex Gartrell and Julian Anastasov
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-18 10:59:33 +02:00
Herbert Xu
689f1c9de2 ipsec: Remove obsolete MAX_AH_AUTH_LEN
While tracking down the MAX_AH_AUTH_LEN crash in an old kernel
I thought that this limit was rather arbitrary and we should
just get rid of it.

In fact it seems that we've already done all the work needed
to remove it apart from actually removing it.  This limit was
there in order to limit stack usage.  Since we've already
switched over to allocating scratch space using kmalloc, there
is no longer any need to limit the authentication length.

This patch kills all references to it, including the BUG_ONs
that led me here.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-18 10:54:36 +02:00
Alex Gartrell
bc18d37f67 ipvs: Allow heterogeneous pools now that we support them
Remove the temporary consistency check and add a case statement to only
allow ipip mixed dests.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-18 08:59:29 +09:00
Julian Anastasov
f18ae7206e ipvs: use the new dest addr family field
Use the new address family field cp->daf when printing
cp->daddr in logs or connection listing.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-18 08:59:28 +09:00
Julian Anastasov
4d316f3f9a ipvs: use correct address family in scheduler logs
Needed to support svc->af != dest->af.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-18 08:59:23 +09:00
Marcel Holtmann
0097db06f5 Bluetooth: Remove exported hci_recv_fragment function
The hci_recv_fragment function is no longer used by any driver and thus
do not export it. In fact it is not even needed by the core and it can
be removed altogether.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-09-17 10:23:03 +03:00
John Fastabend
9f6c38e70b net: sched: cls_cgroup need tcf_exts_init in all cases
This ensures the tcf_exts_init() is called for all cases.

Fixes: 952313bd62 ("net: sched: cls_cgroup use RCU")
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 16:26:39 -04:00
David S. Miller
2d9d65fa44 Merge branch 'net_next_ovs' of git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch
Pravin B Shelar says:

====================
Open vSwitch

Following patches adds recirculation and hash action to OVS.
First patch removes pointer to stack object. Next three patches
does code restructuring which is required for last patch.
Recirculation implementation is changed, according to comments from
David Miller, to avoid using recursive calls in OVS. It is using
queue to record recirc action and deferred recirc is executed at
the end of current actions execution.

v1-v2:
Changed subsystem name in subject to openvswitch
v2-v3:
Added patch to remove pkt_key pointer from skb->cb.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 16:21:48 -04:00
Marcel Holtmann
dda3b191eb net: rfkill: gpio: Enable module auto-loading for ACPI based switches
For the ACPI based switches the MODULE_DEVICE_TABLE is missing to
export the entries for module auto-loading.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-09-16 16:09:01 -04:00
John Fastabend
e1f93eb06c net: sched: cls_fw: add missing tcf_exts_init call in fw_change()
When allocating a new structure we also need to call tcf_exts_init
to initialize exts.

A follow up patch might be in order to remove some of this code
and do tcf_exts_assign(). With this we could remove the
tcf_exts_init/tcf_exts_change pattern for some of the classifiers.
As part of the future tcf_actions RCU series this will need to be
done. For now fix the call here.

Fixes e35a8ee599 ("net: sched: fw use RCU")
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 15:59:36 -04:00
John Fastabend
d14cbfc88f net: sched: cls_cgroup fix possible memory leak of 'new'
tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   54996b529a
commit: c7953ef230 [625/646] net: sched: cls_cgroup use RCU

net/sched/cls_cgroup.c:130 cls_cgroup_change() warn: possible memory leak of 'new'
net/sched/cls_cgroup.c:135 cls_cgroup_change() warn: possible memory leak of 'new'
net/sched/cls_cgroup.c:139 cls_cgroup_change() warn: possible memory leak of 'new'

Fixes: c7953ef230 ("net: sched: cls_cgroup use RCU")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 15:59:36 -04:00
John Fastabend
a96366bf26 net: sched: cls_u32 add missing rcu_assign_pointer and annotation
Add missing rcu_assign_pointer and missing  annotation for ht_up
in cls_u32.c

Caught by kbuild bot,

>> net/sched/cls_u32.c:378:36: sparse: incorrect type in initializer (different address spaces)
   net/sched/cls_u32.c:378:36:    expected struct tc_u_hnode *ht
   net/sched/cls_u32.c:378:36:    got struct tc_u_hnode [noderef] <asn:4>*ht_up
>> net/sched/cls_u32.c:610:54: sparse: incorrect type in argument 4 (different address spaces)
   net/sched/cls_u32.c:610:54:    expected struct tc_u_hnode *ht
   net/sched/cls_u32.c:610:54:    got struct tc_u_hnode [noderef] <asn:4>*ht_up
>> net/sched/cls_u32.c:684:18: sparse: incorrect type in assignment (different address spaces)
   net/sched/cls_u32.c:684:18:    expected struct tc_u_hnode [noderef] <asn:4>*ht_up
   net/sched/cls_u32.c:684:18:    got struct tc_u_hnode *[assigned] ht
>> net/sched/cls_u32.c:359:18: sparse: dereference of noderef expression

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 15:59:36 -04:00
John Fastabend
80aab73de4 net: sched: fix unsued cpu variable
kbuild test robot reported an unused variable cpu in cls_u32.c
after the patch below. This happens when PERF and MARK config
variables are disabled

Fix this is to use separate variables for perf and mark
and define the cpu variable inside the ifdef logic.

Fixes: 459d5f626d ("net: sched: make cls_u32 per cpu")'
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 15:59:36 -04:00
WANG Cong
69301eaa7f net_sched: fix a null pointer dereference in tcindex_set_parms()
This patch fixes the following crash:

[   42.199159] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[   42.200027] IP: [<ffffffff817e3fc4>] tcindex_set_parms+0x45c/0x526
[   42.200027] PGD d2319067 PUD d4ffe067 PMD 0
[   42.200027] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[   42.200027] CPU: 0 PID: 541 Comm: tc Not tainted 3.17.0-rc4+ #603
[   42.200027] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   42.200027] task: ffff8800d22d2670 ti: ffff8800ce790000 task.ti: ffff8800ce790000
[   42.200027] RIP: 0010:[<ffffffff817e3fc4>]  [<ffffffff817e3fc4>] tcindex_set_parms+0x45c/0x526
[   42.200027] RSP: 0018:ffff8800ce793898  EFLAGS: 00010202
[   42.200027] RAX: 0000000000000001 RBX: ffff8800d1786498 RCX: 0000000000000000
[   42.200027] RDX: ffffffff82114ec8 RSI: ffffffff82114ec8 RDI: ffffffff82114ec8
[   42.200027] RBP: ffff8800ce793958 R08: 00000000000080d0 R09: 0000000000000001
[   42.200027] R10: ffff8800ce7939a0 R11: 0000000000000246 R12: ffff8800d017d238
[   42.200027] R13: 0000000000000018 R14: ffff8800d017c6a0 R15: ffff8800d1786620
[   42.200027] FS:  00007f4e24539740(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
[   42.200027] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   42.200027] CR2: 0000000000000018 CR3: 00000000cff38000 CR4: 00000000000006f0
[   42.200027] Stack:
[   42.200027]  ffff8800ce0949f0 0000000000000000 0000000200000003 ffff880000000000
[   42.200027]  ffff8800ce7938b8 ffff8800ce7938b8 0000000600000007 0000000000000000
[   42.200027]  ffff8800ce7938d8 ffff8800ce7938d8 0000000600000007 ffff8800ce0949f0
[   42.200027] Call Trace:
[   42.200027]  [<ffffffff817e4169>] tcindex_change+0xdb/0xee
[   42.200027]  [<ffffffff817c16ca>] tc_ctl_tfilter+0x44d/0x63f
[   42.200027]  [<ffffffff8179d161>] rtnetlink_rcv_msg+0x181/0x194
[   42.200027]  [<ffffffff8179cf9d>] ? rtnl_lock+0x17/0x19
[   42.200027]  [<ffffffff8179cfe0>] ? __rtnl_unlock+0x17/0x17
[   42.200027]  [<ffffffff817ee296>] netlink_rcv_skb+0x49/0x8b
[   43.462494]  [<ffffffff8179cfc2>] rtnetlink_rcv+0x23/0x2a
[   43.462494]  [<ffffffff817ec8df>] netlink_unicast+0xc7/0x148
[   43.462494]  [<ffffffff817ed413>] netlink_sendmsg+0x5cb/0x63d
[   43.462494]  [<ffffffff810ad781>] ? mark_lock+0x2e/0x224
[   43.462494]  [<ffffffff817757b8>] __sock_sendmsg_nosec+0x25/0x27
[   43.462494]  [<ffffffff81778165>] sock_sendmsg+0x57/0x71
[   43.462494]  [<ffffffff81152bbd>] ? might_fault+0x57/0xa4
[   43.462494]  [<ffffffff81152c06>] ? might_fault+0xa0/0xa4
[   43.462494]  [<ffffffff81152bbd>] ? might_fault+0x57/0xa4
[   43.462494]  [<ffffffff817838fd>] ? verify_iovec+0x69/0xb7
[   43.462494]  [<ffffffff817784f8>] ___sys_sendmsg+0x21d/0x2bb
[   43.462494]  [<ffffffff81009db3>] ? native_sched_clock+0x35/0x37
[   43.462494]  [<ffffffff8109ab53>] ? sched_clock_local+0x12/0x72
[   43.462494]  [<ffffffff810ad781>] ? mark_lock+0x2e/0x224
[   43.462494]  [<ffffffff8109ada4>] ? sched_clock_cpu+0xa0/0xb9
[   43.462494]  [<ffffffff810aee37>] ? __lock_acquire+0x5fe/0xde4
[   43.462494]  [<ffffffff8119f570>] ? rcu_read_lock_held+0x36/0x38
[   43.462494]  [<ffffffff8119f75a>] ? __fcheck_files.isra.7+0x4b/0x57
[   43.462494]  [<ffffffff8119fbf2>] ? __fget_light+0x30/0x54
[   43.462494]  [<ffffffff81779012>] __sys_sendmsg+0x42/0x60
[   43.462494]  [<ffffffff81779042>] SyS_sendmsg+0x12/0x1c
[   43.462494]  [<ffffffff819d24d2>] system_call_fastpath+0x16/0x1b

'p->h' could be NULL while 'cp->h' is always update to date.

Fixes: commit 331b72922c ("net: sched: RCU cls_tcindex")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-By: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 15:20:09 -04:00
WANG Cong
44b75e4317 net_sched: fix memory leak in cls_tcindex
Fixes: commit 331b72922c ("net: sched: RCU cls_tcindex")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-By: John Fastabend <john.r.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 15:19:23 -04:00
David Howells
0c903ab64f KEYS: Make the key matching functions return bool
Make the key matching functions pointed to by key_match_data::cmp return bool
rather than int.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
2014-09-16 17:36:08 +01:00
David Howells
c06cfb08b8 KEYS: Remove key_type::match in favour of overriding default by match_preparse
A previous patch added a ->match_preparse() method to the key type.  This is
allowed to override the function called by the iteration algorithm.
Therefore, we can just set a default that simply checks for an exact match of
the key description with the original criterion data and allow match_preparse
to override it as needed.

The key_type::match op is then redundant and can be removed, as can the
user_match() function.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
2014-09-16 17:36:06 +01:00
David Howells
462919591a KEYS: Preparse match data
Preparse the match data.  This provides several advantages:

 (1) The preparser can reject invalid criteria up front.

 (2) The preparser can convert the criteria to binary data if necessary (the
     asymmetric key type really wants to do binary comparison of the key IDs).

 (3) The preparser can set the type of search to be performed.  This means
     that it's not then a one-off setting in the key type.

 (4) The preparser can set an appropriate comparator function.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
2014-09-16 17:36:02 +01:00
Steffen Klassert
b8c203b2d2 xfrm: Generate queueing routes only from route lookup functions
Currently we genarate a queueing route if we have matching policies
but can not resolve the states and the sysctl xfrm_larval_drop is
disabled. Here we assume that dst_output() is called to kill the
queued packets. Unfortunately this assumption is not true in all
cases, so it is possible that these packets leave the system unwanted.

We fix this by generating queueing routes only from the
route lookup functions, here we can guarantee a call to
dst_output() afterwards.

Fixes: a0073fe18e ("xfrm: Add a state resolution packet queue")
Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-16 10:08:49 +02:00
Steffen Klassert
f92ee61982 xfrm: Generate blackhole routes only from route lookup functions
Currently we genarate a blackhole route route whenever we have
matching policies but can not resolve the states. Here we assume
that dst_output() is called to kill the balckholed packets.
Unfortunately this assumption is not true in all cases, so
it is possible that these packets leave the system unwanted.

We fix this by generating blackhole routes only from the
route lookup functions, here we can guarantee a call to
dst_output() afterwards.

Fixes: 2774c131b1 ("xfrm: Handle blackhole route creation via afinfo.")
Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-16 10:08:40 +02:00
Andy Zhou
971427f353 openvswitch: Add recirc and hash action.
Recirc action allows a packet to reenter openvswitch processing.
currently openvswitch lookup flow for packet received and execute
set of actions on that packet, with help of recirc action we can
process/modify the packet and recirculate it back in openvswitch
for another pass.

OVS hash action calculates 5-tupple hash and set hash in flow-key
hash. This can be used along with recirculation for distributing
packets among different ports for bond devices.
For example:
OVS bonding can use following actions:
Match on: bond flow; Action: hash, recirc(id)
Match on: recirc-id == id and hash lower bits == a;
          Action: output port_bond_a

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-15 23:28:14 -07:00
Andy Zhou
32ae87ff79 openvswitch: simplify sample action implementation
The current sample() function implementation is more complicated
than necessary in handling single user space action optimization
and skb reference counting. There is no functional changes.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-15 23:28:14 -07:00
Pravin B Shelar
8c8b1b83fc openvswitch: Use tun_key only for egress tunnel path.
Currently tun_key is used for passing tunnel information
on ingress and egress path, this cause confusion.  Following
patch removes its use on ingress path make it egress only parameter.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-09-15 23:28:13 -07:00
Pravin B Shelar
83c8df26a3 openvswitch: refactor ovs flow extract API.
OVS flow extract is called on packet receive or packet
execute code path.  Following patch defines separate API
for extracting flow-key in packet execute code path.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-09-15 23:28:13 -07:00
Pravin B Shelar
2ff3e4e486 openvswitch: Remove pkt_key from OVS_CB
OVS keeps pointer to packet key in skb->cb, but the packet key is
store on stack. This could make code bit tricky. So it is better to
get rid of the pointer.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-15 23:28:13 -07:00
Julian Anastasov
cf34e646da ipvs: address family of LBLCR entry depends on svc family
The LBLCR entries should use svc->af, not dest->af.
Needed to support svc->af != dest->af.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:38 +09:00
Julian Anastasov
f7fa380069 ipvs: address family of LBLC entry depends on svc family
The LBLC entries should use svc->af, not dest->af.
Needed to support svc->af != dest->af.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:38 +09:00
Alex Gartrell
8052ba2925 ipvs: support ipv4 in ipv6 and ipv6 in ipv4 tunnel forwarding
Pull the common logic for preparing an skb to prepend the header into a
single function and then set fields such that they can be used in either
case (generalize tos and tclass to dscp, hop_limit and ttl to ttl, etc)

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:37 +09:00
Alex Gartrell
c63e4de2be ipvs: Add generic ensure_mtu_is_adequate to handle mixed pools
The out_rt functions check to see if the mtu is large enough for the packet
and, if not, send icmp messages (TOOBIG or DEST_UNREACH) to the source and
bail out.  We needed the ability to send ICMP from the out_rt_v6 function
and DEST_UNREACH from the out_rt function, so we just pulled it out into a
common function.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:37 +09:00
Alex Gartrell
919aa0b2bb ipvs: Pull out update_pmtu code
Another step toward heterogeneous pools, this removes another piece of
functionality currently specific to each address family type.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:36 +09:00
Alex Gartrell
4a4739d56b ipvs: Pull out crosses_local_route_boundary logic
This logic is repeated in both out_rt functions so it was redundant.
Additionally, we'll need to be able to do checks to route v4 to v6 and vice
versa in order to deal with heterogeneous pools.

This patch also updates the callsites to add an additional parameter to the
out route functions.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:36 +09:00
Alex Gartrell
391f503d69 ipvs: prevent mixing heterogeneous pools and synchronization
The synchronization protocol is not compatible with heterogeneous pools, so
we need to verify that we're not turning both on at the same time.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:35 +09:00
Alex Gartrell
ba38528aae ipvs: Supply destination address family to ip_vs_conn_new
The assumption that dest af is equal to service af is now unreliable, so we
must specify it manually so as not to copy just the first 4 bytes of a v6
address or doing an illegal read of 16 butes on a v6 address.

We "lie" in two places: for synchronization (which we will explicitly
disallow from happening when we have heterogeneous pools) and for black
hole addresses where there's no real dest.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:34 +09:00
Alex Gartrell
ad147aa4dd ipvs: Pass destination address family to ip_vs_trash_get_dest
Part of a series of diffs to tease out destination family from virtual
family.  This diff just adds a parameter to ip_vs_trash_get and then uses
it for comparison rather than svc->af.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:34 +09:00
Alex Gartrell
655eef103d ipvs: Supply destination addr family to ip_vs_{lookup_dest,find_dest}
We need to remove the assumption that virtual address family is the same as
real address family in order to support heterogeneous services (that is,
services with v4 vips and v6 backends or the opposite).

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:33 +09:00
Alex Gartrell
6cff339bbd ipvs: Add destination address family to netlink interface
This is necessary to support heterogeneous pools.  For example, if you have
an ipv6 addressed network, you'll want to be able to forward ipv4 traffic
into it.

This patch enforces that destination address family is the same as service
family, as none of the forwarding mechanisms support anything else.

For the old setsockopt mechanism, we simply set the dest address family to
AF_INET as we do with the service.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:33 +09:00