Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2019-07-03

The following pull-request contains BPF updates for your *net-next* tree.

There is a minor merge conflict in mlx5 due to 8960b38932 ("linux/dim:
Rename externally used net_dim members") which has been pulled into your
tree in the meantime, but resolution seems not that bad ... getting current
bpf-next out now before there's coming more on mlx5. ;) I'm Cc'ing Saeed
just so he's aware of the resolution below:

** First conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c:

  <<<<<<< HEAD
  static int mlx5e_open_cq(struct mlx5e_channel *c,
                           struct dim_cq_moder moder,
                           struct mlx5e_cq_param *param,
                           struct mlx5e_cq *cq)
  =======
  int mlx5e_open_cq(struct mlx5e_channel *c, struct net_dim_cq_moder moder,
                    struct mlx5e_cq_param *param, struct mlx5e_cq *cq)
  >>>>>>> e5a3e259ef

Resolution is to take the second chunk and rename net_dim_cq_moder into
dim_cq_moder. Also the signature for mlx5e_open_cq() in ...

  drivers/net/ethernet/mellanox/mlx5/core/en.h +977

... and in mlx5e_open_xsk() ...

  drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +64

... needs the same rename from net_dim_cq_moder into dim_cq_moder.

** Second conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c:

  <<<<<<< HEAD
          int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
          struct dim_cq_moder icocq_moder = {0, 0};
          struct net_device *netdev = priv->netdev;
          struct mlx5e_channel *c;
          unsigned int irq;
  =======
          struct net_dim_cq_moder icocq_moder = {0, 0};
  >>>>>>> e5a3e259ef

Take the second chunk and rename net_dim_cq_moder into dim_cq_moder
as well.

Let me know if you run into any issues. Anyway, the main changes are:

1) Long-awaited AF_XDP support for mlx5e driver, from Maxim.

2) Addition of two new per-cgroup BPF hooks for getsockopt and
   setsockopt along with a new sockopt program type which allows more
   fine-grained pass/reject settings for containers. Also add a sock_ops
   callback that can be selectively enabled on a per-socket basis and is
   executed for every RTT to help tracking TCP statistics, both features
   from Stanislav.

3) Follow-up fix from loops in precision tracking which was not propagating
   precision marks and as a result verifier assumed that some branches were
   not taken and therefore wrongly removed as dead code, from Alexei.

4) Fix BPF cgroup release synchronization race which could lead to a
   double-free if a leaf's cgroup_bpf object is released and a new BPF
   program is attached to the one of ancestor cgroups in parallel, from Roman.

5) Support for bulking XDP_TX on veth devices which improves performance
   in some cases by around 9%, from Toshiaki.

6) Allow for lookups into BPF devmap and improve feedback when calling into
   bpf_redirect_map() as lookup is now performed right away in the helper
   itself, from Toke.

7) Add support for fq's Earliest Departure Time to the Host Bandwidth
   Manager (HBM) sample BPF program, from Lawrence.

8) Various cleanups and minor fixes all over the place from many others.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
David S. Miller
2019-07-04 12:48:21 -07:00
98 changed files with 6225 additions and 869 deletions

View File

@@ -29,7 +29,8 @@ CGROUP COMMANDS
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
| *ATTACH_TYPE* := { **ingress** | **egress** | **sock_create** | **sock_ops** | **device** |
| **bind4** | **bind6** | **post_bind4** | **post_bind6** | **connect4** | **connect6** |
| **sendmsg4** | **sendmsg6** | **recvmsg4** | **recvmsg6** | **sysctl** }
| **sendmsg4** | **sendmsg6** | **recvmsg4** | **recvmsg6** | **sysctl** |
| **getsockopt** | **setsockopt** }
| *ATTACH_FLAGS* := { **multi** | **override** }
DESCRIPTION
@@ -90,7 +91,9 @@ DESCRIPTION
an unconnected udp4 socket (since 5.2);
**recvmsg6** call to recvfrom(2), recvmsg(2), recvmmsg(2) for
an unconnected udp6 socket (since 5.2);
**sysctl** sysctl access (since 5.2).
**sysctl** sysctl access (since 5.2);
**getsockopt** call to getsockopt (since 5.3);
**setsockopt** call to setsockopt (since 5.3).
**bpftool cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG*
Detach *PROG* from the cgroup *CGROUP* and attach type

View File

@@ -40,7 +40,8 @@ PROG COMMANDS
| **lwt_seg6local** | **sockops** | **sk_skb** | **sk_msg** | **lirc_mode2** |
| **cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | **cgroup/post_bind6** |
| **cgroup/connect4** | **cgroup/connect6** | **cgroup/sendmsg4** | **cgroup/sendmsg6** |
| **cgroup/recvmsg4** | **cgroup/recvmsg6** | **cgroup/sysctl**
| **cgroup/recvmsg4** | **cgroup/recvmsg6** | **cgroup/sysctl** |
| **cgroup/getsockopt** | **cgroup/setsockopt**
| }
| *ATTACH_TYPE* := {
| **msg_verdict** | **stream_verdict** | **stream_parser** | **flow_dissector**

View File

@@ -379,7 +379,8 @@ _bpftool()
cgroup/sendmsg4 cgroup/sendmsg6 \
cgroup/recvmsg4 cgroup/recvmsg6 \
cgroup/post_bind4 cgroup/post_bind6 \
cgroup/sysctl" -- \
cgroup/sysctl cgroup/getsockopt \
cgroup/setsockopt" -- \
"$cur" ) )
return 0
;;
@@ -689,7 +690,8 @@ _bpftool()
attach|detach)
local ATTACH_TYPES='ingress egress sock_create sock_ops \
device bind4 bind6 post_bind4 post_bind6 connect4 \
connect6 sendmsg4 sendmsg6 recvmsg4 recvmsg6 sysctl'
connect6 sendmsg4 sendmsg6 recvmsg4 recvmsg6 sysctl \
getsockopt setsockopt'
local ATTACH_FLAGS='multi override'
local PROG_TYPE='id pinned tag'
case $prev in
@@ -699,7 +701,8 @@ _bpftool()
;;
ingress|egress|sock_create|sock_ops|device|bind4|bind6|\
post_bind4|post_bind6|connect4|connect6|sendmsg4|\
sendmsg6|recvmsg4|recvmsg6|sysctl)
sendmsg6|recvmsg4|recvmsg6|sysctl|getsockopt|\
setsockopt)
COMPREPLY=( $( compgen -W "$PROG_TYPE" -- \
"$cur" ) )
return 0

View File

@@ -26,7 +26,8 @@
" sock_ops | device | bind4 | bind6 |\n" \
" post_bind4 | post_bind6 | connect4 |\n" \
" connect6 | sendmsg4 | sendmsg6 |\n" \
" recvmsg4 | recvmsg6 | sysctl }"
" recvmsg4 | recvmsg6 | sysctl |\n" \
" getsockopt | setsockopt }"
static const char * const attach_type_strings[] = {
[BPF_CGROUP_INET_INGRESS] = "ingress",
@@ -45,6 +46,8 @@ static const char * const attach_type_strings[] = {
[BPF_CGROUP_SYSCTL] = "sysctl",
[BPF_CGROUP_UDP4_RECVMSG] = "recvmsg4",
[BPF_CGROUP_UDP6_RECVMSG] = "recvmsg6",
[BPF_CGROUP_GETSOCKOPT] = "getsockopt",
[BPF_CGROUP_SETSOCKOPT] = "setsockopt",
[__MAX_BPF_ATTACH_TYPE] = NULL,
};

View File

@@ -74,6 +74,7 @@ static const char * const prog_type_name[] = {
[BPF_PROG_TYPE_SK_REUSEPORT] = "sk_reuseport",
[BPF_PROG_TYPE_FLOW_DISSECTOR] = "flow_dissector",
[BPF_PROG_TYPE_CGROUP_SYSCTL] = "cgroup_sysctl",
[BPF_PROG_TYPE_CGROUP_SOCKOPT] = "cgroup_sockopt",
};
extern const char * const map_type_name[];

View File

@@ -1071,7 +1071,8 @@ static int do_help(int argc, char **argv)
" cgroup/bind4 | cgroup/bind6 | cgroup/post_bind4 |\n"
" cgroup/post_bind6 | cgroup/connect4 | cgroup/connect6 |\n"
" cgroup/sendmsg4 | cgroup/sendmsg6 | cgroup/recvmsg4 |\n"
" cgroup/recvmsg6 }\n"
" cgroup/recvmsg6 | cgroup/getsockopt |\n"
" cgroup/setsockopt }\n"
" ATTACH_TYPE := { msg_verdict | stream_verdict | stream_parser |\n"
" flow_dissector }\n"
" " HELP_SPEC_OPTIONS "\n"

View File

@@ -170,6 +170,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_FLOW_DISSECTOR,
BPF_PROG_TYPE_CGROUP_SYSCTL,
BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
BPF_PROG_TYPE_CGROUP_SOCKOPT,
};
enum bpf_attach_type {
@@ -194,6 +195,8 @@ enum bpf_attach_type {
BPF_CGROUP_SYSCTL,
BPF_CGROUP_UDP4_RECVMSG,
BPF_CGROUP_UDP6_RECVMSG,
BPF_CGROUP_GETSOCKOPT,
BPF_CGROUP_SETSOCKOPT,
__MAX_BPF_ATTACH_TYPE
};
@@ -1764,6 +1767,7 @@ union bpf_attr {
* * **BPF_SOCK_OPS_RTO_CB_FLAG** (retransmission time out)
* * **BPF_SOCK_OPS_RETRANS_CB_FLAG** (retransmission)
* * **BPF_SOCK_OPS_STATE_CB_FLAG** (TCP state change)
* * **BPF_SOCK_OPS_RTT_CB_FLAG** (every RTT)
*
* Therefore, this function can be used to clear a callback flag by
* setting the appropriate bit to zero. e.g. to disable the RTO
@@ -3066,6 +3070,12 @@ struct bpf_tcp_sock {
* sum(delta(snd_una)), or how many bytes
* were acked.
*/
__u32 dsack_dups; /* RFC4898 tcpEStatsStackDSACKDups
* total number of DSACK blocks received
*/
__u32 delivered; /* Total data packets delivered incl. rexmits */
__u32 delivered_ce; /* Like the above but only ECE marked packets */
__u32 icsk_retransmits; /* Number of unrecovered [RTO] timeouts */
};
struct bpf_sock_tuple {
@@ -3308,7 +3318,8 @@ struct bpf_sock_ops {
#define BPF_SOCK_OPS_RTO_CB_FLAG (1<<0)
#define BPF_SOCK_OPS_RETRANS_CB_FLAG (1<<1)
#define BPF_SOCK_OPS_STATE_CB_FLAG (1<<2)
#define BPF_SOCK_OPS_ALL_CB_FLAGS 0x7 /* Mask of all currently
#define BPF_SOCK_OPS_RTT_CB_FLAG (1<<3)
#define BPF_SOCK_OPS_ALL_CB_FLAGS 0xF /* Mask of all currently
* supported cb flags
*/
@@ -3363,6 +3374,8 @@ enum {
BPF_SOCK_OPS_TCP_LISTEN_CB, /* Called on listen(2), right after
* socket transition to LISTEN state.
*/
BPF_SOCK_OPS_RTT_CB, /* Called on every RTT.
*/
};
/* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
@@ -3541,4 +3554,15 @@ struct bpf_sysctl {
*/
};
struct bpf_sockopt {
__bpf_md_ptr(struct bpf_sock *, sk);
__bpf_md_ptr(void *, optval);
__bpf_md_ptr(void *, optval_end);
__s32 level;
__s32 optname;
__s32 optlen;
__s32 retval;
};
#endif /* _UAPI__LINUX_BPF_H__ */

View File

@@ -46,6 +46,7 @@ struct xdp_mmap_offsets {
#define XDP_UMEM_FILL_RING 5
#define XDP_UMEM_COMPLETION_RING 6
#define XDP_STATISTICS 7
#define XDP_OPTIONS 8
struct xdp_umem_reg {
__u64 addr; /* Start of packet data area */
@@ -60,6 +61,13 @@ struct xdp_statistics {
__u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
};
struct xdp_options {
__u32 flags;
};
/* Flags for the flags field of struct xdp_options */
#define XDP_OPTIONS_ZEROCOPY (1 << 0)
/* Pgoff for mmaping the rings */
#define XDP_PGOFF_RX_RING 0
#define XDP_PGOFF_TX_RING 0x80000000

View File

@@ -778,7 +778,7 @@ static struct bpf_map *bpf_object__add_map(struct bpf_object *obj)
if (obj->nr_maps < obj->maps_cap)
return &obj->maps[obj->nr_maps++];
new_cap = max(4ul, obj->maps_cap * 3 / 2);
new_cap = max((size_t)4, obj->maps_cap * 3 / 2);
new_maps = realloc(obj->maps, new_cap * sizeof(*obj->maps));
if (!new_maps) {
pr_warning("alloc maps for object failed\n");
@@ -1169,7 +1169,7 @@ static int bpf_object__init_user_btf_map(struct bpf_object *obj,
pr_debug("map '%s': found key_size = %u.\n",
map_name, sz);
if (map->def.key_size && map->def.key_size != sz) {
pr_warning("map '%s': conflictling key size %u != %u.\n",
pr_warning("map '%s': conflicting key size %u != %u.\n",
map_name, map->def.key_size, sz);
return -EINVAL;
}
@@ -1197,7 +1197,7 @@ static int bpf_object__init_user_btf_map(struct bpf_object *obj,
pr_debug("map '%s': found key [%u], sz = %lld.\n",
map_name, t->type, sz);
if (map->def.key_size && map->def.key_size != sz) {
pr_warning("map '%s': conflictling key size %u != %lld.\n",
pr_warning("map '%s': conflicting key size %u != %lld.\n",
map_name, map->def.key_size, sz);
return -EINVAL;
}
@@ -1212,7 +1212,7 @@ static int bpf_object__init_user_btf_map(struct bpf_object *obj,
pr_debug("map '%s': found value_size = %u.\n",
map_name, sz);
if (map->def.value_size && map->def.value_size != sz) {
pr_warning("map '%s': conflictling value size %u != %u.\n",
pr_warning("map '%s': conflicting value size %u != %u.\n",
map_name, map->def.value_size, sz);
return -EINVAL;
}
@@ -1240,7 +1240,7 @@ static int bpf_object__init_user_btf_map(struct bpf_object *obj,
pr_debug("map '%s': found value [%u], sz = %lld.\n",
map_name, t->type, sz);
if (map->def.value_size && map->def.value_size != sz) {
pr_warning("map '%s': conflictling value size %u != %lld.\n",
pr_warning("map '%s': conflicting value size %u != %lld.\n",
map_name, map->def.value_size, sz);
return -EINVAL;
}
@@ -2646,6 +2646,7 @@ static bool bpf_prog_type__needs_kver(enum bpf_prog_type type)
case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
case BPF_PROG_TYPE_PERF_EVENT:
case BPF_PROG_TYPE_CGROUP_SYSCTL:
case BPF_PROG_TYPE_CGROUP_SOCKOPT:
return false;
case BPF_PROG_TYPE_KPROBE:
default:
@@ -3604,6 +3605,10 @@ static const struct {
BPF_CGROUP_UDP6_RECVMSG),
BPF_EAPROG_SEC("cgroup/sysctl", BPF_PROG_TYPE_CGROUP_SYSCTL,
BPF_CGROUP_SYSCTL),
BPF_EAPROG_SEC("cgroup/getsockopt", BPF_PROG_TYPE_CGROUP_SOCKOPT,
BPF_CGROUP_GETSOCKOPT),
BPF_EAPROG_SEC("cgroup/setsockopt", BPF_PROG_TYPE_CGROUP_SOCKOPT,
BPF_CGROUP_SETSOCKOPT),
};
#undef BPF_PROG_SEC_IMPL
@@ -3867,10 +3872,7 @@ int bpf_prog_load(const char *file, enum bpf_prog_type type,
int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
struct bpf_object **pobj, int *prog_fd)
{
struct bpf_object_open_attr open_attr = {
.file = attr->file,
.prog_type = attr->prog_type,
};
struct bpf_object_open_attr open_attr = {};
struct bpf_program *prog, *first_prog = NULL;
enum bpf_attach_type expected_attach_type;
enum bpf_prog_type prog_type;
@@ -3883,6 +3885,9 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
if (!attr->file)
return -EINVAL;
open_attr.file = attr->file;
open_attr.prog_type = attr->prog_type;
obj = bpf_object__open_xattr(&open_attr);
if (IS_ERR_OR_NULL(obj))
return -ENOENT;

View File

@@ -101,6 +101,7 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
case BPF_PROG_TYPE_SK_REUSEPORT:
case BPF_PROG_TYPE_FLOW_DISSECTOR:
case BPF_PROG_TYPE_CGROUP_SYSCTL:
case BPF_PROG_TYPE_CGROUP_SOCKOPT:
default:
break;
}

View File

@@ -65,6 +65,7 @@ struct xsk_socket {
int xsks_map_fd;
__u32 queue_id;
char ifname[IFNAMSIZ];
bool zc;
};
struct xsk_nl_info {
@@ -326,7 +327,8 @@ static int xsk_get_max_queues(struct xsk_socket *xsk)
channels.cmd = ETHTOOL_GCHANNELS;
ifr.ifr_data = (void *)&channels;
strncpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ);
strncpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ - 1);
ifr.ifr_name[IFNAMSIZ - 1] = '\0';
err = ioctl(fd, SIOCETHTOOL, &ifr);
if (err && errno != EOPNOTSUPP) {
ret = -errno;
@@ -480,6 +482,7 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
void *rx_map = NULL, *tx_map = NULL;
struct sockaddr_xdp sxdp = {};
struct xdp_mmap_offsets off;
struct xdp_options opts;
struct xsk_socket *xsk;
socklen_t optlen;
int err;
@@ -597,6 +600,16 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
}
xsk->prog_fd = -1;
optlen = sizeof(opts);
err = getsockopt(xsk->fd, SOL_XDP, XDP_OPTIONS, &opts, &optlen);
if (err) {
err = -errno;
goto out_mmap_tx;
}
xsk->zc = opts.flags & XDP_OPTIONS_ZEROCOPY;
if (!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) {
err = xsk_setup_xdp_prog(xsk);
if (err)

View File

@@ -167,7 +167,7 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
#define XSK_RING_CONS__DEFAULT_NUM_DESCS 2048
#define XSK_RING_PROD__DEFAULT_NUM_DESCS 2048
#define XSK_UMEM__DEFAULT_FRAME_SHIFT 11 /* 2048 bytes */
#define XSK_UMEM__DEFAULT_FRAME_SHIFT 12 /* 4096 bytes */
#define XSK_UMEM__DEFAULT_FRAME_SIZE (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
#define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0

View File

@@ -39,3 +39,6 @@ libbpf.so.*
test_hashmap
test_btf_dump
xdping
test_sockopt
test_sockopt_sk
test_sockopt_multi

View File

@@ -15,7 +15,7 @@ LLC ?= llc
LLVM_OBJCOPY ?= llvm-objcopy
LLVM_READELF ?= llvm-readelf
BTF_PAHOLE ?= pahole
CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(BPFDIR) -I$(GENDIR) $(GENFLAGS) -I../../../include \
CFLAGS += -g -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(BPFDIR) -I$(GENDIR) $(GENFLAGS) -I../../../include \
-Dbpf_prog_load=bpf_prog_test_load \
-Dbpf_load_program=bpf_test_load_program
LDLIBS += -lcap -lelf -lrt -lpthread
@@ -26,7 +26,8 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
test_sock test_btf test_sockmap get_cgroup_id_user test_socket_cookie \
test_cgroup_storage test_select_reuseport test_section_names \
test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
test_btf_dump test_cgroup_attach xdping
test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
test_sockopt_multi test_tcp_rtt
BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
TEST_GEN_FILES = $(BPF_OBJ_FILES)
@@ -46,6 +47,7 @@ TEST_PROGS := test_kmod.sh \
test_libbpf.sh \
test_xdp_redirect.sh \
test_xdp_meta.sh \
test_xdp_veth.sh \
test_offload.py \
test_sock_addr.sh \
test_tunnel.sh \
@@ -102,6 +104,10 @@ $(OUTPUT)/test_netcnt: cgroup_helpers.c
$(OUTPUT)/test_sock_fields: cgroup_helpers.c
$(OUTPUT)/test_sysctl: cgroup_helpers.c
$(OUTPUT)/test_cgroup_attach: cgroup_helpers.c
$(OUTPUT)/test_sockopt: cgroup_helpers.c
$(OUTPUT)/test_sockopt_sk: cgroup_helpers.c
$(OUTPUT)/test_sockopt_multi: cgroup_helpers.c
$(OUTPUT)/test_tcp_rtt: cgroup_helpers.c
.PHONY: force

View File

@@ -75,8 +75,7 @@ typedef struct {
void* co_name; // PyCodeObject.co_name
} FrameData;
static inline __attribute__((__always_inline__)) void*
get_thread_state(void* tls_base, PidData* pidData)
static __always_inline void *get_thread_state(void *tls_base, PidData *pidData)
{
void* thread_state;
int key;
@@ -87,8 +86,8 @@ get_thread_state(void* tls_base, PidData* pidData)
return thread_state;
}
static inline __attribute__((__always_inline__)) bool
get_frame_data(void* frame_ptr, PidData* pidData, FrameData* frame, Symbol* symbol)
static __always_inline bool get_frame_data(void *frame_ptr, PidData *pidData,
FrameData *frame, Symbol *symbol)
{
// read data from PyFrameObject
bpf_probe_read(&frame->f_back,
@@ -161,7 +160,7 @@ struct bpf_elf_map SEC("maps") stackmap = {
.max_elem = 1000,
};
static inline __attribute__((__always_inline__)) int __on_event(struct pt_regs *ctx)
static __always_inline int __on_event(struct pt_regs *ctx)
{
uint64_t pid_tgid = bpf_get_current_pid_tgid();
pid_t pid = (pid_t)(pid_tgid >> 32);

View File

@@ -0,0 +1,71 @@
// SPDX-License-Identifier: GPL-2.0
#include <netinet/in.h>
#include <linux/bpf.h>
#include "bpf_helpers.h"
char _license[] SEC("license") = "GPL";
__u32 _version SEC("version") = 1;
SEC("cgroup/getsockopt/child")
int _getsockopt_child(struct bpf_sockopt *ctx)
{
__u8 *optval_end = ctx->optval_end;
__u8 *optval = ctx->optval;
if (ctx->level != SOL_IP || ctx->optname != IP_TOS)
return 1;
if (optval + 1 > optval_end)
return 0; /* EPERM, bounds check */
if (optval[0] != 0x80)
return 0; /* EPERM, unexpected optval from the kernel */
ctx->retval = 0; /* Reset system call return value to zero */
optval[0] = 0x90;
ctx->optlen = 1;
return 1;
}
SEC("cgroup/getsockopt/parent")
int _getsockopt_parent(struct bpf_sockopt *ctx)
{
__u8 *optval_end = ctx->optval_end;
__u8 *optval = ctx->optval;
if (ctx->level != SOL_IP || ctx->optname != IP_TOS)
return 1;
if (optval + 1 > optval_end)
return 0; /* EPERM, bounds check */
if (optval[0] != 0x90)
return 0; /* EPERM, unexpected optval from the kernel */
ctx->retval = 0; /* Reset system call return value to zero */
optval[0] = 0xA0;
ctx->optlen = 1;
return 1;
}
SEC("cgroup/setsockopt")
int _setsockopt(struct bpf_sockopt *ctx)
{
__u8 *optval_end = ctx->optval_end;
__u8 *optval = ctx->optval;
if (ctx->level != SOL_IP || ctx->optname != IP_TOS)
return 1;
if (optval + 1 > optval_end)
return 0; /* EPERM, bounds check */
optval[0] += 0x10;
ctx->optlen = 1;
return 1;
}

View File

@@ -0,0 +1,111 @@
// SPDX-License-Identifier: GPL-2.0
#include <netinet/in.h>
#include <linux/bpf.h>
#include "bpf_helpers.h"
char _license[] SEC("license") = "GPL";
__u32 _version SEC("version") = 1;
#define SOL_CUSTOM 0xdeadbeef
struct sockopt_sk {
__u8 val;
};
struct bpf_map_def SEC("maps") socket_storage_map = {
.type = BPF_MAP_TYPE_SK_STORAGE,
.key_size = sizeof(int),
.value_size = sizeof(struct sockopt_sk),
.map_flags = BPF_F_NO_PREALLOC,
};
BPF_ANNOTATE_KV_PAIR(socket_storage_map, int, struct sockopt_sk);
SEC("cgroup/getsockopt")
int _getsockopt(struct bpf_sockopt *ctx)
{
__u8 *optval_end = ctx->optval_end;
__u8 *optval = ctx->optval;
struct sockopt_sk *storage;
if (ctx->level == SOL_IP && ctx->optname == IP_TOS)
/* Not interested in SOL_IP:IP_TOS;
* let next BPF program in the cgroup chain or kernel
* handle it.
*/
return 1;
if (ctx->level == SOL_SOCKET && ctx->optname == SO_SNDBUF) {
/* Not interested in SOL_SOCKET:SO_SNDBUF;
* let next BPF program in the cgroup chain or kernel
* handle it.
*/
return 1;
}
if (ctx->level != SOL_CUSTOM)
return 0; /* EPERM, deny everything except custom level */
if (optval + 1 > optval_end)
return 0; /* EPERM, bounds check */
storage = bpf_sk_storage_get(&socket_storage_map, ctx->sk, 0,
BPF_SK_STORAGE_GET_F_CREATE);
if (!storage)
return 0; /* EPERM, couldn't get sk storage */
if (!ctx->retval)
return 0; /* EPERM, kernel should not have handled
* SOL_CUSTOM, something is wrong!
*/
ctx->retval = 0; /* Reset system call return value to zero */
optval[0] = storage->val;
ctx->optlen = 1;
return 1;
}
SEC("cgroup/setsockopt")
int _setsockopt(struct bpf_sockopt *ctx)
{
__u8 *optval_end = ctx->optval_end;
__u8 *optval = ctx->optval;
struct sockopt_sk *storage;
if (ctx->level == SOL_IP && ctx->optname == IP_TOS)
/* Not interested in SOL_IP:IP_TOS;
* let next BPF program in the cgroup chain or kernel
* handle it.
*/
return 1;
if (ctx->level == SOL_SOCKET && ctx->optname == SO_SNDBUF) {
/* Overwrite SO_SNDBUF value */
if (optval + sizeof(__u32) > optval_end)
return 0; /* EPERM, bounds check */
*(__u32 *)optval = 0x55AA;
ctx->optlen = 4;
return 1;
}
if (ctx->level != SOL_CUSTOM)
return 0; /* EPERM, deny everything except custom level */
if (optval + 1 > optval_end)
return 0; /* EPERM, bounds check */
storage = bpf_sk_storage_get(&socket_storage_map, ctx->sk, 0,
BPF_SK_STORAGE_GET_F_CREATE);
if (!storage)
return 0; /* EPERM, couldn't get sk storage */
storage->val = optval[0];
ctx->optlen = -1; /* BPF has consumed this option, don't call kernel
* setsockopt handler.
*/
return 1;
}

View File

@@ -266,8 +266,8 @@ struct tls_index {
uint64_t offset;
};
static inline __attribute__((always_inline))
void *calc_location(struct strobe_value_loc *loc, void *tls_base)
static __always_inline void *calc_location(struct strobe_value_loc *loc,
void *tls_base)
{
/*
* tls_mode value is:
@@ -327,10 +327,10 @@ void *calc_location(struct strobe_value_loc *loc, void *tls_base)
: NULL;
}
static inline __attribute__((always_inline))
void read_int_var(struct strobemeta_cfg *cfg, size_t idx, void *tls_base,
struct strobe_value_generic *value,
struct strobemeta_payload *data)
static __always_inline void read_int_var(struct strobemeta_cfg *cfg,
size_t idx, void *tls_base,
struct strobe_value_generic *value,
struct strobemeta_payload *data)
{
void *location = calc_location(&cfg->int_locs[idx], tls_base);
if (!location)
@@ -342,10 +342,11 @@ void read_int_var(struct strobemeta_cfg *cfg, size_t idx, void *tls_base,
data->int_vals_set_mask |= (1 << idx);
}
static inline __attribute__((always_inline))
uint64_t read_str_var(struct strobemeta_cfg* cfg, size_t idx, void *tls_base,
struct strobe_value_generic *value,
struct strobemeta_payload *data, void *payload)
static __always_inline uint64_t read_str_var(struct strobemeta_cfg *cfg,
size_t idx, void *tls_base,
struct strobe_value_generic *value,
struct strobemeta_payload *data,
void *payload)
{
void *location;
uint32_t len;
@@ -371,10 +372,11 @@ uint64_t read_str_var(struct strobemeta_cfg* cfg, size_t idx, void *tls_base,
return len;
}
static inline __attribute__((always_inline))
void *read_map_var(struct strobemeta_cfg *cfg, size_t idx, void *tls_base,
struct strobe_value_generic *value,
struct strobemeta_payload* data, void *payload)
static __always_inline void *read_map_var(struct strobemeta_cfg *cfg,
size_t idx, void *tls_base,
struct strobe_value_generic *value,
struct strobemeta_payload *data,
void *payload)
{
struct strobe_map_descr* descr = &data->map_descrs[idx];
struct strobe_map_raw map;
@@ -435,9 +437,9 @@ void *read_map_var(struct strobemeta_cfg *cfg, size_t idx, void *tls_base,
* read_strobe_meta returns NULL, if no metadata was read; otherwise returns
* pointer to *right after* payload ends
*/
static inline __attribute__((always_inline))
void *read_strobe_meta(struct task_struct* task,
struct strobemeta_payload* data) {
static __always_inline void *read_strobe_meta(struct task_struct *task,
struct strobemeta_payload *data)
{
pid_t pid = bpf_get_current_pid_tgid() >> 32;
struct strobe_value_generic value = {0};
struct strobemeta_cfg *cfg;

View File

@@ -0,0 +1,61 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h>
#include "bpf_helpers.h"
char _license[] SEC("license") = "GPL";
__u32 _version SEC("version") = 1;
struct tcp_rtt_storage {
__u32 invoked;
__u32 dsack_dups;
__u32 delivered;
__u32 delivered_ce;
__u32 icsk_retransmits;
};
struct bpf_map_def SEC("maps") socket_storage_map = {
.type = BPF_MAP_TYPE_SK_STORAGE,
.key_size = sizeof(int),
.value_size = sizeof(struct tcp_rtt_storage),
.map_flags = BPF_F_NO_PREALLOC,
};
BPF_ANNOTATE_KV_PAIR(socket_storage_map, int, struct tcp_rtt_storage);
SEC("sockops")
int _sockops(struct bpf_sock_ops *ctx)
{
struct tcp_rtt_storage *storage;
struct bpf_tcp_sock *tcp_sk;
int op = (int) ctx->op;
struct bpf_sock *sk;
sk = ctx->sk;
if (!sk)
return 1;
storage = bpf_sk_storage_get(&socket_storage_map, sk, 0,
BPF_SK_STORAGE_GET_F_CREATE);
if (!storage)
return 1;
if (op == BPF_SOCK_OPS_TCP_CONNECT_CB) {
bpf_sock_ops_cb_flags_set(ctx, BPF_SOCK_OPS_RTT_CB_FLAG);
return 1;
}
if (op != BPF_SOCK_OPS_RTT_CB)
return 1;
tcp_sk = bpf_tcp_sock(sk);
if (!tcp_sk)
return 1;
storage->invoked++;
storage->dsack_dups = tcp_sk->dsack_dups;
storage->delivered = tcp_sk->delivered;
storage->delivered_ce = tcp_sk->delivered_ce;
storage->icsk_retransmits = tcp_sk->icsk_retransmits;
return 1;
}

View File

@@ -1,9 +1,10 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2019 Facebook
#include <features.h>
typedef unsigned int u32;
static __attribute__((always_inline)) u32 rol32(u32 word, unsigned int shift)
static __always_inline u32 rol32(u32 word, unsigned int shift)
{
return (word << shift) | (word >> ((-shift) & 31));
}

View File

@@ -54,7 +54,7 @@ struct sr6_tlv_t {
unsigned char value[0];
} BPF_PACKET_HEADER;
static __attribute__((always_inline)) struct ip6_srh_t *get_srh(struct __sk_buff *skb)
static __always_inline struct ip6_srh_t *get_srh(struct __sk_buff *skb)
{
void *cursor, *data_end;
struct ip6_srh_t *srh;
@@ -88,9 +88,9 @@ static __attribute__((always_inline)) struct ip6_srh_t *get_srh(struct __sk_buff
return srh;
}
static __attribute__((always_inline))
int update_tlv_pad(struct __sk_buff *skb, uint32_t new_pad,
uint32_t old_pad, uint32_t pad_off)
static __always_inline int update_tlv_pad(struct __sk_buff *skb,
uint32_t new_pad, uint32_t old_pad,
uint32_t pad_off)
{
int err;
@@ -118,10 +118,11 @@ int update_tlv_pad(struct __sk_buff *skb, uint32_t new_pad,
return 0;
}
static __attribute__((always_inline))
int is_valid_tlv_boundary(struct __sk_buff *skb, struct ip6_srh_t *srh,
uint32_t *tlv_off, uint32_t *pad_size,
uint32_t *pad_off)
static __always_inline int is_valid_tlv_boundary(struct __sk_buff *skb,
struct ip6_srh_t *srh,
uint32_t *tlv_off,
uint32_t *pad_size,
uint32_t *pad_off)
{
uint32_t srh_off, cur_off;
int offset_valid = 0;
@@ -177,9 +178,9 @@ int is_valid_tlv_boundary(struct __sk_buff *skb, struct ip6_srh_t *srh,
return 0;
}
static __attribute__((always_inline))
int add_tlv(struct __sk_buff *skb, struct ip6_srh_t *srh, uint32_t tlv_off,
struct sr6_tlv_t *itlv, uint8_t tlv_size)
static __always_inline int add_tlv(struct __sk_buff *skb,
struct ip6_srh_t *srh, uint32_t tlv_off,
struct sr6_tlv_t *itlv, uint8_t tlv_size)
{
uint32_t srh_off = (char *)srh - (char *)(long)skb->data;
uint8_t len_remaining, new_pad;

View File

@@ -2,7 +2,7 @@
// Copyright (c) 2019 Facebook
#include <linux/bpf.h>
#include "bpf_helpers.h"
#define ATTR __attribute__((always_inline))
#define ATTR __always_inline
#include "test_jhash.h"
SEC("scale90_inline")

View File

@@ -0,0 +1,31 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h>
#include "bpf_helpers.h"
struct bpf_map_def SEC("maps") tx_port = {
.type = BPF_MAP_TYPE_DEVMAP,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 8,
};
SEC("redirect_map_0")
int xdp_redirect_map_0(struct xdp_md *xdp)
{
return bpf_redirect_map(&tx_port, 0, 0);
}
SEC("redirect_map_1")
int xdp_redirect_map_1(struct xdp_md *xdp)
{
return bpf_redirect_map(&tx_port, 1, 0);
}
SEC("redirect_map_2")
int xdp_redirect_map_2(struct xdp_md *xdp)
{
return bpf_redirect_map(&tx_port, 2, 0);
}
char _license[] SEC("license") = "GPL";

View File

@@ -0,0 +1,12 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h>
#include "bpf_helpers.h"
SEC("tx")
int xdp_tx(struct xdp_md *xdp)
{
return XDP_TX;
}
char _license[] SEC("license") = "GPL";

View File

@@ -134,6 +134,16 @@ static struct sec_name_test tests[] = {
{0, BPF_PROG_TYPE_CGROUP_SYSCTL, BPF_CGROUP_SYSCTL},
{0, BPF_CGROUP_SYSCTL},
},
{
"cgroup/getsockopt",
{0, BPF_PROG_TYPE_CGROUP_SOCKOPT, BPF_CGROUP_GETSOCKOPT},
{0, BPF_CGROUP_GETSOCKOPT},
},
{
"cgroup/setsockopt",
{0, BPF_PROG_TYPE_CGROUP_SOCKOPT, BPF_CGROUP_SETSOCKOPT},
{0, BPF_CGROUP_SETSOCKOPT},
},
};
static int test_prog_type_by_name(const struct sec_name_test *test)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,374 @@
// SPDX-License-Identifier: GPL-2.0
#include <error.h>
#include <errno.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <linux/filter.h>
#include <bpf/bpf.h>
#include <bpf/libbpf.h>
#include "bpf_rlimit.h"
#include "bpf_util.h"
#include "cgroup_helpers.h"
static int prog_attach(struct bpf_object *obj, int cgroup_fd, const char *title)
{
enum bpf_attach_type attach_type;
enum bpf_prog_type prog_type;
struct bpf_program *prog;
int err;
err = libbpf_prog_type_by_name(title, &prog_type, &attach_type);
if (err) {
log_err("Failed to deduct types for %s BPF program", title);
return -1;
}
prog = bpf_object__find_program_by_title(obj, title);
if (!prog) {
log_err("Failed to find %s BPF program", title);
return -1;
}
err = bpf_prog_attach(bpf_program__fd(prog), cgroup_fd,
attach_type, BPF_F_ALLOW_MULTI);
if (err) {
log_err("Failed to attach %s BPF program", title);
return -1;
}
return 0;
}
static int prog_detach(struct bpf_object *obj, int cgroup_fd, const char *title)
{
enum bpf_attach_type attach_type;
enum bpf_prog_type prog_type;
struct bpf_program *prog;
int err;
err = libbpf_prog_type_by_name(title, &prog_type, &attach_type);
if (err)
return -1;
prog = bpf_object__find_program_by_title(obj, title);
if (!prog)
return -1;
err = bpf_prog_detach2(bpf_program__fd(prog), cgroup_fd,
attach_type);
if (err)
return -1;
return 0;
}
static int run_getsockopt_test(struct bpf_object *obj, int cg_parent,
int cg_child, int sock_fd)
{
socklen_t optlen;
__u8 buf;
int err;
/* Set IP_TOS to the expected value (0x80). */
buf = 0x80;
err = setsockopt(sock_fd, SOL_IP, IP_TOS, &buf, 1);
if (err < 0) {
log_err("Failed to call setsockopt(IP_TOS)");
goto detach;
}
buf = 0x00;
optlen = 1;
err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
if (err) {
log_err("Failed to call getsockopt(IP_TOS)");
goto detach;
}
if (buf != 0x80) {
log_err("Unexpected getsockopt 0x%x != 0x80 without BPF", buf);
err = -1;
goto detach;
}
/* Attach child program and make sure it returns new value:
* - kernel: -> 0x80
* - child: 0x80 -> 0x90
*/
err = prog_attach(obj, cg_child, "cgroup/getsockopt/child");
if (err)
goto detach;
buf = 0x00;
optlen = 1;
err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
if (err) {
log_err("Failed to call getsockopt(IP_TOS)");
goto detach;
}
if (buf != 0x90) {
log_err("Unexpected getsockopt 0x%x != 0x90", buf);
err = -1;
goto detach;
}
/* Attach parent program and make sure it returns new value:
* - kernel: -> 0x80
* - child: 0x80 -> 0x90
* - parent: 0x90 -> 0xA0
*/
err = prog_attach(obj, cg_parent, "cgroup/getsockopt/parent");
if (err)
goto detach;
buf = 0x00;
optlen = 1;
err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
if (err) {
log_err("Failed to call getsockopt(IP_TOS)");
goto detach;
}
if (buf != 0xA0) {
log_err("Unexpected getsockopt 0x%x != 0xA0", buf);
err = -1;
goto detach;
}
/* Setting unexpected initial sockopt should return EPERM:
* - kernel: -> 0x40
* - child: unexpected 0x40, EPERM
* - parent: unexpected 0x40, EPERM
*/
buf = 0x40;
if (setsockopt(sock_fd, SOL_IP, IP_TOS, &buf, 1) < 0) {
log_err("Failed to call setsockopt(IP_TOS)");
goto detach;
}
buf = 0x00;
optlen = 1;
err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
if (!err) {
log_err("Unexpected success from getsockopt(IP_TOS)");
goto detach;
}
/* Detach child program and make sure we still get EPERM:
* - kernel: -> 0x40
* - parent: unexpected 0x40, EPERM
*/
err = prog_detach(obj, cg_child, "cgroup/getsockopt/child");
if (err) {
log_err("Failed to detach child program");
goto detach;
}
buf = 0x00;
optlen = 1;
err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
if (!err) {
log_err("Unexpected success from getsockopt(IP_TOS)");
goto detach;
}
/* Set initial value to the one the parent program expects:
* - kernel: -> 0x90
* - parent: 0x90 -> 0xA0
*/
buf = 0x90;
err = setsockopt(sock_fd, SOL_IP, IP_TOS, &buf, 1);
if (err < 0) {
log_err("Failed to call setsockopt(IP_TOS)");
goto detach;
}
buf = 0x00;
optlen = 1;
err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
if (err) {
log_err("Failed to call getsockopt(IP_TOS)");
goto detach;
}
if (buf != 0xA0) {
log_err("Unexpected getsockopt 0x%x != 0xA0", buf);
err = -1;
goto detach;
}
detach:
prog_detach(obj, cg_child, "cgroup/getsockopt/child");
prog_detach(obj, cg_parent, "cgroup/getsockopt/parent");
return err;
}
static int run_setsockopt_test(struct bpf_object *obj, int cg_parent,
int cg_child, int sock_fd)
{
socklen_t optlen;
__u8 buf;
int err;
/* Set IP_TOS to the expected value (0x80). */
buf = 0x80;
err = setsockopt(sock_fd, SOL_IP, IP_TOS, &buf, 1);
if (err < 0) {
log_err("Failed to call setsockopt(IP_TOS)");
goto detach;
}
buf = 0x00;
optlen = 1;
err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
if (err) {
log_err("Failed to call getsockopt(IP_TOS)");
goto detach;
}
if (buf != 0x80) {
log_err("Unexpected getsockopt 0x%x != 0x80 without BPF", buf);
err = -1;
goto detach;
}
/* Attach child program and make sure it adds 0x10. */
err = prog_attach(obj, cg_child, "cgroup/setsockopt");
if (err)
goto detach;
buf = 0x80;
err = setsockopt(sock_fd, SOL_IP, IP_TOS, &buf, 1);
if (err < 0) {
log_err("Failed to call setsockopt(IP_TOS)");
goto detach;
}
buf = 0x00;
optlen = 1;
err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
if (err) {
log_err("Failed to call getsockopt(IP_TOS)");
goto detach;
}
if (buf != 0x80 + 0x10) {
log_err("Unexpected getsockopt 0x%x != 0x80 + 0x10", buf);
err = -1;
goto detach;
}
/* Attach parent program and make sure it adds another 0x10. */
err = prog_attach(obj, cg_parent, "cgroup/setsockopt");
if (err)
goto detach;
buf = 0x80;
err = setsockopt(sock_fd, SOL_IP, IP_TOS, &buf, 1);
if (err < 0) {
log_err("Failed to call setsockopt(IP_TOS)");
goto detach;
}
buf = 0x00;
optlen = 1;
err = getsockopt(sock_fd, SOL_IP, IP_TOS, &buf, &optlen);
if (err) {
log_err("Failed to call getsockopt(IP_TOS)");
goto detach;
}
if (buf != 0x80 + 2 * 0x10) {
log_err("Unexpected getsockopt 0x%x != 0x80 + 2 * 0x10", buf);
err = -1;
goto detach;
}
detach:
prog_detach(obj, cg_child, "cgroup/setsockopt");
prog_detach(obj, cg_parent, "cgroup/setsockopt");
return err;
}
int main(int argc, char **argv)
{
struct bpf_prog_load_attr attr = {
.file = "./sockopt_multi.o",
};
int cg_parent = -1, cg_child = -1;
struct bpf_object *obj = NULL;
int sock_fd = -1;
int err = -1;
int ignored;
if (setup_cgroup_environment()) {
log_err("Failed to setup cgroup environment\n");
goto out;
}
cg_parent = create_and_get_cgroup("/parent");
if (cg_parent < 0) {
log_err("Failed to create cgroup /parent\n");
goto out;
}
cg_child = create_and_get_cgroup("/parent/child");
if (cg_child < 0) {
log_err("Failed to create cgroup /parent/child\n");
goto out;
}
if (join_cgroup("/parent/child")) {
log_err("Failed to join cgroup /parent/child\n");
goto out;
}
err = bpf_prog_load_xattr(&attr, &obj, &ignored);
if (err) {
log_err("Failed to load BPF object");
goto out;
}
sock_fd = socket(AF_INET, SOCK_STREAM, 0);
if (sock_fd < 0) {
log_err("Failed to create socket");
goto out;
}
if (run_getsockopt_test(obj, cg_parent, cg_child, sock_fd))
err = -1;
printf("test_sockopt_multi: getsockopt %s\n",
err ? "FAILED" : "PASSED");
if (run_setsockopt_test(obj, cg_parent, cg_child, sock_fd))
err = -1;
printf("test_sockopt_multi: setsockopt %s\n",
err ? "FAILED" : "PASSED");
out:
close(sock_fd);
bpf_object__close(obj);
close(cg_child);
close(cg_parent);
printf("test_sockopt_multi: %s\n", err ? "FAILED" : "PASSED");
return err ? EXIT_FAILURE : EXIT_SUCCESS;
}

View File

@@ -0,0 +1,211 @@
// SPDX-License-Identifier: GPL-2.0
#include <errno.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <linux/filter.h>
#include <bpf/bpf.h>
#include <bpf/libbpf.h>
#include "bpf_rlimit.h"
#include "bpf_util.h"
#include "cgroup_helpers.h"
#define CG_PATH "/sockopt"
#define SOL_CUSTOM 0xdeadbeef
static int getsetsockopt(void)
{
int fd, err;
union {
char u8[4];
__u32 u32;
} buf = {};
socklen_t optlen;
fd = socket(AF_INET, SOCK_STREAM, 0);
if (fd < 0) {
log_err("Failed to create socket");
return -1;
}
/* IP_TOS - BPF bypass */
buf.u8[0] = 0x08;
err = setsockopt(fd, SOL_IP, IP_TOS, &buf, 1);
if (err) {
log_err("Failed to call setsockopt(IP_TOS)");
goto err;
}
buf.u8[0] = 0x00;
optlen = 1;
err = getsockopt(fd, SOL_IP, IP_TOS, &buf, &optlen);
if (err) {
log_err("Failed to call getsockopt(IP_TOS)");
goto err;
}
if (buf.u8[0] != 0x08) {
log_err("Unexpected getsockopt(IP_TOS) buf[0] 0x%02x != 0x08",
buf.u8[0]);
goto err;
}
/* IP_TTL - EPERM */
buf.u8[0] = 1;
err = setsockopt(fd, SOL_IP, IP_TTL, &buf, 1);
if (!err || errno != EPERM) {
log_err("Unexpected success from setsockopt(IP_TTL)");
goto err;
}
/* SOL_CUSTOM - handled by BPF */
buf.u8[0] = 0x01;
err = setsockopt(fd, SOL_CUSTOM, 0, &buf, 1);
if (err) {
log_err("Failed to call setsockopt");
goto err;
}
buf.u32 = 0x00;
optlen = 4;
err = getsockopt(fd, SOL_CUSTOM, 0, &buf, &optlen);
if (err) {
log_err("Failed to call getsockopt");
goto err;
}
if (optlen != 1) {
log_err("Unexpected optlen %d != 1", optlen);
goto err;
}
if (buf.u8[0] != 0x01) {
log_err("Unexpected buf[0] 0x%02x != 0x01", buf.u8[0]);
goto err;
}
/* SO_SNDBUF is overwritten */
buf.u32 = 0x01010101;
err = setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &buf, 4);
if (err) {
log_err("Failed to call setsockopt(SO_SNDBUF)");
goto err;
}
buf.u32 = 0x00;
optlen = 4;
err = getsockopt(fd, SOL_SOCKET, SO_SNDBUF, &buf, &optlen);
if (err) {
log_err("Failed to call getsockopt(SO_SNDBUF)");
goto err;
}
if (buf.u32 != 0x55AA*2) {
log_err("Unexpected getsockopt(SO_SNDBUF) 0x%x != 0x55AA*2",
buf.u32);
goto err;
}
close(fd);
return 0;
err:
close(fd);
return -1;
}
static int prog_attach(struct bpf_object *obj, int cgroup_fd, const char *title)
{
enum bpf_attach_type attach_type;
enum bpf_prog_type prog_type;
struct bpf_program *prog;
int err;
err = libbpf_prog_type_by_name(title, &prog_type, &attach_type);
if (err) {
log_err("Failed to deduct types for %s BPF program", title);
return -1;
}
prog = bpf_object__find_program_by_title(obj, title);
if (!prog) {
log_err("Failed to find %s BPF program", title);
return -1;
}
err = bpf_prog_attach(bpf_program__fd(prog), cgroup_fd,
attach_type, 0);
if (err) {
log_err("Failed to attach %s BPF program", title);
return -1;
}
return 0;
}
static int run_test(int cgroup_fd)
{
struct bpf_prog_load_attr attr = {
.file = "./sockopt_sk.o",
};
struct bpf_object *obj;
int ignored;
int err;
err = bpf_prog_load_xattr(&attr, &obj, &ignored);
if (err) {
log_err("Failed to load BPF object");
return -1;
}
err = prog_attach(obj, cgroup_fd, "cgroup/getsockopt");
if (err)
goto close_bpf_object;
err = prog_attach(obj, cgroup_fd, "cgroup/setsockopt");
if (err)
goto close_bpf_object;
err = getsetsockopt();
close_bpf_object:
bpf_object__close(obj);
return err;
}
int main(int args, char **argv)
{
int cgroup_fd;
int err = EXIT_SUCCESS;
if (setup_cgroup_environment())
goto cleanup_obj;
cgroup_fd = create_and_get_cgroup(CG_PATH);
if (cgroup_fd < 0)
goto cleanup_cgroup_env;
if (join_cgroup(CG_PATH))
goto cleanup_cgroup;
if (run_test(cgroup_fd))
err = EXIT_FAILURE;
printf("test_sockopt_sk: %s\n",
err == EXIT_SUCCESS ? "PASSED" : "FAILED");
cleanup_cgroup:
close(cgroup_fd);
cleanup_cgroup_env:
cleanup_cgroup_environment();
cleanup_obj:
return err;
}

View File

@@ -0,0 +1,254 @@
// SPDX-License-Identifier: GPL-2.0
#include <error.h>
#include <errno.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <pthread.h>
#include <linux/filter.h>
#include <bpf/bpf.h>
#include <bpf/libbpf.h>
#include "bpf_rlimit.h"
#include "bpf_util.h"
#include "cgroup_helpers.h"
#define CG_PATH "/tcp_rtt"
struct tcp_rtt_storage {
__u32 invoked;
__u32 dsack_dups;
__u32 delivered;
__u32 delivered_ce;
__u32 icsk_retransmits;
};
static void send_byte(int fd)
{
char b = 0x55;
if (write(fd, &b, sizeof(b)) != 1)
error(1, errno, "Failed to send single byte");
}
static int verify_sk(int map_fd, int client_fd, const char *msg, __u32 invoked,
__u32 dsack_dups, __u32 delivered, __u32 delivered_ce,
__u32 icsk_retransmits)
{
int err = 0;
struct tcp_rtt_storage val;
if (bpf_map_lookup_elem(map_fd, &client_fd, &val) < 0)
error(1, errno, "Failed to read socket storage");
if (val.invoked != invoked) {
log_err("%s: unexpected bpf_tcp_sock.invoked %d != %d",
msg, val.invoked, invoked);
err++;
}
if (val.dsack_dups != dsack_dups) {
log_err("%s: unexpected bpf_tcp_sock.dsack_dups %d != %d",
msg, val.dsack_dups, dsack_dups);
err++;
}
if (val.delivered != delivered) {
log_err("%s: unexpected bpf_tcp_sock.delivered %d != %d",
msg, val.delivered, delivered);
err++;
}
if (val.delivered_ce != delivered_ce) {
log_err("%s: unexpected bpf_tcp_sock.delivered_ce %d != %d",
msg, val.delivered_ce, delivered_ce);
err++;
}
if (val.icsk_retransmits != icsk_retransmits) {
log_err("%s: unexpected bpf_tcp_sock.icsk_retransmits %d != %d",
msg, val.icsk_retransmits, icsk_retransmits);
err++;
}
return err;
}
static int connect_to_server(int server_fd)
{
struct sockaddr_storage addr;
socklen_t len = sizeof(addr);
int fd;
fd = socket(AF_INET, SOCK_STREAM, 0);
if (fd < 0) {
log_err("Failed to create client socket");
return -1;
}
if (getsockname(server_fd, (struct sockaddr *)&addr, &len)) {
log_err("Failed to get server addr");
goto out;
}
if (connect(fd, (const struct sockaddr *)&addr, len) < 0) {
log_err("Fail to connect to server");
goto out;
}
return fd;
out:
close(fd);
return -1;
}
static int run_test(int cgroup_fd, int server_fd)
{
struct bpf_prog_load_attr attr = {
.prog_type = BPF_PROG_TYPE_SOCK_OPS,
.file = "./tcp_rtt.o",
.expected_attach_type = BPF_CGROUP_SOCK_OPS,
};
struct bpf_object *obj;
struct bpf_map *map;
int client_fd;
int prog_fd;
int map_fd;
int err;
err = bpf_prog_load_xattr(&attr, &obj, &prog_fd);
if (err) {
log_err("Failed to load BPF object");
return -1;
}
map = bpf_map__next(NULL, obj);
map_fd = bpf_map__fd(map);
err = bpf_prog_attach(prog_fd, cgroup_fd, BPF_CGROUP_SOCK_OPS, 0);
if (err) {
log_err("Failed to attach BPF program");
goto close_bpf_object;
}
client_fd = connect_to_server(server_fd);
if (client_fd < 0) {
err = -1;
goto close_bpf_object;
}
err += verify_sk(map_fd, client_fd, "syn-ack",
/*invoked=*/1,
/*dsack_dups=*/0,
/*delivered=*/1,
/*delivered_ce=*/0,
/*icsk_retransmits=*/0);
send_byte(client_fd);
err += verify_sk(map_fd, client_fd, "first payload byte",
/*invoked=*/2,
/*dsack_dups=*/0,
/*delivered=*/2,
/*delivered_ce=*/0,
/*icsk_retransmits=*/0);
close(client_fd);
close_bpf_object:
bpf_object__close(obj);
return err;
}
static int start_server(void)
{
struct sockaddr_in addr = {
.sin_family = AF_INET,
.sin_addr.s_addr = htonl(INADDR_LOOPBACK),
};
int fd;
fd = socket(AF_INET, SOCK_STREAM, 0);
if (fd < 0) {
log_err("Failed to create server socket");
return -1;
}
if (bind(fd, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
log_err("Failed to bind socket");
close(fd);
return -1;
}
return fd;
}
static void *server_thread(void *arg)
{
struct sockaddr_storage addr;
socklen_t len = sizeof(addr);
int fd = *(int *)arg;
int client_fd;
if (listen(fd, 1) < 0)
error(1, errno, "Failed to listed on socket");
client_fd = accept(fd, (struct sockaddr *)&addr, &len);
if (client_fd < 0)
error(1, errno, "Failed to accept client");
/* Wait for the next connection (that never arrives)
* to keep this thread alive to prevent calling
* close() on client_fd.
*/
if (accept(fd, (struct sockaddr *)&addr, &len) >= 0)
error(1, errno, "Unexpected success in second accept");
close(client_fd);
return NULL;
}
int main(int args, char **argv)
{
int server_fd, cgroup_fd;
int err = EXIT_SUCCESS;
pthread_t tid;
if (setup_cgroup_environment())
goto cleanup_obj;
cgroup_fd = create_and_get_cgroup(CG_PATH);
if (cgroup_fd < 0)
goto cleanup_cgroup_env;
if (join_cgroup(CG_PATH))
goto cleanup_cgroup;
server_fd = start_server();
if (server_fd < 0) {
err = EXIT_FAILURE;
goto cleanup_cgroup;
}
pthread_create(&tid, NULL, server_thread, (void *)&server_fd);
if (run_test(cgroup_fd, server_fd))
err = EXIT_FAILURE;
close(server_fd);
printf("test_sockopt_sk: %s\n",
err == EXIT_SUCCESS ? "PASSED" : "FAILED");
cleanup_cgroup:
close(cgroup_fd);
cleanup_cgroup_env:
cleanup_cgroup_environment();
cleanup_obj:
return err;
}

View File

@@ -0,0 +1,118 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
#
# Create 3 namespaces with 3 veth peers, and
# forward packets in-between using native XDP
#
# XDP_TX
# NS1(veth11) NS2(veth22) NS3(veth33)
# | | |
# | | |
# (veth1, (veth2, (veth3,
# id:111) id:122) id:133)
# ^ | ^ | ^ |
# | | XDP_REDIRECT | | XDP_REDIRECT | |
# | ------------------ ------------------ |
# -----------------------------------------
# XDP_REDIRECT
# Kselftest framework requirement - SKIP code is 4.
ksft_skip=4
TESTNAME=xdp_veth
BPF_FS=$(awk '$3 == "bpf" {print $2; exit}' /proc/mounts)
BPF_DIR=$BPF_FS/test_$TESTNAME
_cleanup()
{
set +e
ip link del veth1 2> /dev/null
ip link del veth2 2> /dev/null
ip link del veth3 2> /dev/null
ip netns del ns1 2> /dev/null
ip netns del ns2 2> /dev/null
ip netns del ns3 2> /dev/null
rm -rf $BPF_DIR 2> /dev/null
}
cleanup_skip()
{
echo "selftests: $TESTNAME [SKIP]"
_cleanup
exit $ksft_skip
}
cleanup()
{
if [ "$?" = 0 ]; then
echo "selftests: $TESTNAME [PASS]"
else
echo "selftests: $TESTNAME [FAILED]"
fi
_cleanup
}
if [ $(id -u) -ne 0 ]; then
echo "selftests: $TESTNAME [SKIP] Need root privileges"
exit $ksft_skip
fi
if ! ip link set dev lo xdp off > /dev/null 2>&1; then
echo "selftests: $TESTNAME [SKIP] Could not run test without the ip xdp support"
exit $ksft_skip
fi
if [ -z "$BPF_FS" ]; then
echo "selftests: $TESTNAME [SKIP] Could not run test without bpffs mounted"
exit $ksft_skip
fi
if ! bpftool version > /dev/null 2>&1; then
echo "selftests: $TESTNAME [SKIP] Could not run test without bpftool"
exit $ksft_skip
fi
set -e
trap cleanup_skip EXIT
ip netns add ns1
ip netns add ns2
ip netns add ns3
ip link add veth1 index 111 type veth peer name veth11 netns ns1
ip link add veth2 index 122 type veth peer name veth22 netns ns2
ip link add veth3 index 133 type veth peer name veth33 netns ns3
ip link set veth1 up
ip link set veth2 up
ip link set veth3 up
ip -n ns1 addr add 10.1.1.11/24 dev veth11
ip -n ns3 addr add 10.1.1.33/24 dev veth33
ip -n ns1 link set dev veth11 up
ip -n ns2 link set dev veth22 up
ip -n ns3 link set dev veth33 up
mkdir $BPF_DIR
bpftool prog loadall \
xdp_redirect_map.o $BPF_DIR/progs type xdp \
pinmaps $BPF_DIR/maps
bpftool map update pinned $BPF_DIR/maps/tx_port key 0 0 0 0 value 122 0 0 0
bpftool map update pinned $BPF_DIR/maps/tx_port key 1 0 0 0 value 133 0 0 0
bpftool map update pinned $BPF_DIR/maps/tx_port key 2 0 0 0 value 111 0 0 0
ip link set dev veth1 xdp pinned $BPF_DIR/progs/redirect_map_0
ip link set dev veth2 xdp pinned $BPF_DIR/progs/redirect_map_1
ip link set dev veth3 xdp pinned $BPF_DIR/progs/redirect_map_2
ip -n ns1 link set dev veth11 xdp obj xdp_dummy.o sec xdp_dummy
ip -n ns2 link set dev veth22 xdp obj xdp_tx.o sec tx
ip -n ns3 link set dev veth33 xdp obj xdp_dummy.o sec xdp_dummy
trap cleanup EXIT
ip netns exec ns1 ping -c 1 -W 1 10.1.1.33
exit 0