Commit Graph

99142 Commits

Author SHA1 Message Date
Jakub Kicinski
878db9f0f2 pkt_cls: add new tc cls helper to check offload flag and chain index
Very few (mlxsw) upstream drivers seem to allow offload of chains
other than 0.  Save driver developers typing and add a helper for
checking both if ethtool's TC offload flag is on and if chain is 0.
This helper will set the extack appropriately in both error cases.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25 21:23:07 -05:00
Lawrence Brakmo
d44874910a bpf: Add BPF_SOCK_OPS_STATE_CB
Adds support for calling sock_ops BPF program when there is a TCP state
change. Two arguments are used; one for the old state and another for
the new state.

There is a new enum in include/uapi/linux/bpf.h that exports the TCP
states that prepends BPF_ to the current TCP state names. If it is ever
necessary to change the internal TCP state values (other than adding
more to the end), then it will become necessary to convert from the
internal TCP state value to the BPF value before calling the BPF
sock_ops function. There are a set of compile checks added in tcp.c
to detect if the internal and BPF values differ so we can make the
necessary fixes.

New op: BPF_SOCK_OPS_STATE_CB.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25 16:41:15 -08:00
Lawrence Brakmo
a31ad29e6a bpf: Add BPF_SOCK_OPS_RETRANS_CB
Adds support for calling sock_ops BPF program when there is a
retransmission. Three arguments are used; one for the sequence number,
another for the number of segments retransmitted, and the last one for
the return value of tcp_transmit_skb (0 => success).
Does not include syn-ack retransmissions.

New op: BPF_SOCK_OPS_RETRANS_CB.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25 16:41:14 -08:00
Lawrence Brakmo
44f0e43037 bpf: Add support for reading sk_state and more
Add support for reading many more tcp_sock fields

  state,	same as sk->sk_state
  rtt_min	same as sk->rtt_min.s[0].v (current rtt_min)
  snd_ssthresh
  rcv_nxt
  snd_nxt
  snd_una
  mss_cache
  ecn_flags
  rate_delivered
  rate_interval_us
  packets_out
  retrans_out
  total_retrans
  segs_in
  data_segs_in
  segs_out
  data_segs_out
  lost_out
  sacked_out
  sk_txhash
  bytes_received (__u64)
  bytes_acked    (__u64)

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25 16:41:14 -08:00
Lawrence Brakmo
f89013f66d bpf: Add sock_ops RTO callback
Adds an optional call to sock_ops BPF program based on whether the
BPF_SOCK_OPS_RTO_CB_FLAG is set in bpf_sock_ops_flags.
The BPF program is passed 2 arguments: icsk_retransmits and whether the
RTO has expired.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25 16:41:14 -08:00
Lawrence Brakmo
b13d880721 bpf: Adds field bpf_sock_ops_cb_flags to tcp_sock
Adds field bpf_sock_ops_cb_flags to tcp_sock and bpf_sock_ops. Its primary
use is to determine if there should be calls to sock_ops bpf program at
various points in the TCP code. The field is initialized to zero,
disabling the calls. A sock_ops BPF program can set it, per connection and
as necessary, when the connection is established.

It also adds support for reading and writting the field within a
sock_ops BPF program. Reading is done by accessing the field directly.
However, writing is done through the helper function
bpf_sock_ops_cb_flags_set, in order to return an error if a BPF program
is trying to set a callback that is not supported in the current kernel
(i.e. running an older kernel). The helper function returns 0 if it was
able to set all of the bits set in the argument, a positive number
containing the bits that could not be set, or -EINVAL if the socket is
not a full TCP socket.

Examples of where one could call the bpf program:

1) When RTO fires
2) When a packet is retransmitted
3) When the connection terminates
4) When a packet is sent
5) When a packet is received

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25 16:41:14 -08:00
Lawrence Brakmo
de525be2ca bpf: Support passing args to sock_ops bpf function
Adds support for passing up to 4 arguments to sock_ops bpf functions. It
reusues the reply union, so the bpf_sock_ops structures are not
increased in size.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25 16:41:14 -08:00
Lawrence Brakmo
b73042b8a2 bpf: Add write access to tcp_sock and sock fields
This patch adds a macro, SOCK_OPS_SET_FIELD, for writing to
struct tcp_sock or struct sock fields. This required adding a new
field "temp" to struct bpf_sock_ops_kern for temporary storage that
is used by sock_ops_convert_ctx_access. It is used to store and recover
the contents of a register, so the register can be used to store the
address of the sk. Since we cannot overwrite the dst_reg because it
contains the pointer to ctx, nor the src_reg since it contains the value
we want to store, we need an extra register to contain the address
of the sk.

Also adds the macro SOCK_OPS_GET_OR_SET_FIELD that calls one of the
GET or SET macros depending on the value of the TYPE field.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-25 16:41:14 -08:00
Eric Biggers
01950a349e fs/buffer.c: fold init_buffer() into init_page_buffers()
Since commit e76004093d ("fs/buffer.c: remove unnecessary init
operation after allocating buffer_head"), there are no callers of
init_buffer() outside of init_page_buffers().  So just fold it into
init_page_buffers().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-25 19:34:28 -05:00
Eric Biggers
4bfd054ae1 fs: fold __inode_permission() into inode_permission()
Since commit 9c630ebefe ("ovl: simplify permission checking"),
overlayfs doesn't call __inode_permission() anymore, which leaves no
users other than inode_permission().  So just fold it back into
inode_permission().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-25 19:34:28 -05:00
Jürg Billeter
e1fc742e14 fs: add RWF_APPEND
This is the per-I/O equivalent of O_APPEND to support atomic append
operations on any open file.

If a file is opened with O_APPEND, pwrite() ignores the offset and
always appends data to the end of the file. RWF_APPEND enables atomic
append and pwrite() with offset on a single file descriptor.

Signed-off-by: Jürg Billeter <j@bitron.ch>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-25 19:32:55 -05:00
Chao Yu
1c1d35df71 f2fs: support inode creation time
This patch adds creation time field in inode layout to support showing
kstat.btime in ->statx.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 14:10:39 -08:00
David S. Miller
525d0ae7a2 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2018-01-25

Here's one last bluetooth-next pull request for the 4.16 kernel:

 - Improved support for Intel controllers
 - New set_parity method to serdev (agreed with maintainers to be taken
   through bluetooth-next)
 - Fix error path in hci_bcm (missing call to serdev close)
 - New ID for BCM4343A0 UART controller

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25 16:32:28 -05:00
Nicolas Dichtel
f15ca723c1 net: don't call update_pmtu unconditionally
Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
"BUG: unable to handle kernel NULL pointer dereference at           (null)"

Let's add a helper to check if update_pmtu is available before calling it.

Fixes: 52a589d51f ("geneve: update skb dst pmtu on tx path")
Fixes: a93bf0ff44 ("vxlan: update skb dst pmtu on tx path")
CC: Roman Kapl <code@rkapl.cz>
CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25 16:27:34 -05:00
Dan Streetman
4ee806d511 net: tcp: close sock if net namespace is exiting
When a tcp socket is closed, if it detects that its net namespace is
exiting, close immediately and do not wait for FIN sequence.

For normal sockets, a reference is taken to their net namespace, so it will
never exit while the socket is open.  However, kernel sockets do not take a
reference to their net namespace, so it may begin exiting while the kernel
socket is still open.  In this case if the kernel socket is a tcp socket,
it will stay open trying to complete its close sequence.  The sock's dst(s)
hold a reference to their interface, which are all transferred to the
namespace's loopback interface when the real interfaces are taken down.
When the namespace tries to take down its loopback interface, it hangs
waiting for all references to the loopback interface to release, which
results in messages like:

unregister_netdevice: waiting for lo to become free. Usage count = 1

These messages continue until the socket finally times out and closes.
Since the net namespace cleanup holds the net_mutex while calling its
registered pernet callbacks, any new net namespace initialization is
blocked until the current net namespace finishes exiting.

After this change, the tcp socket notices the exiting net namespace, and
closes immediately, releasing its dst(s) and their reference to the
loopback interface, which lets the net namespace continue exiting.

Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811
Signed-off-by: Dan Streetman <ddstreet@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25 10:56:45 -05:00
Rob Herring
7e2978430f PCI: Make of_irq_parse_pci() static
Now that the DT PCI code is merged into drivers/pci, of_irq_parse_pci() can
be static.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Frank Rowand <frowand.list@gmail.com>
2018-01-25 08:48:20 -06:00
Ard Biesheuvel
6657674b23 crypto: sha3-generic - export init/update/final routines
To allow accelerated implementations to fall back to the generic
routines, e.g., in contexts where a SIMD based implementation is
not allowed to run, expose the generic SHA3 init/update/final
routines to other modules.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-01-26 01:10:34 +11:00
Ard Biesheuvel
beeb504adf crypto: sha3-generic - simplify code
In preparation of exposing the generic SHA3 implementation to other
versions as a fallback, simplify the code, and remove an inconsistency
in the output handling (endian swabbing rsizw words of state before
writing the output does not make sense)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-01-26 01:10:33 +11:00
Andy Shevchenko
c505cbd45f device property: Define type of PROPERTY_ENRTY_*() macros
Some of the drivers may use the macro at runtime flow, like

  struct property_entry p[10];
...
  p[index++] = PROPERTY_ENTRY_U8("u8 property", u8_data);

In that case and absence of the data type compiler fails the build:

drivers/char/ipmi/ipmi_dmi.c:79:29: error: Expected ; at end of statement
drivers/char/ipmi/ipmi_dmi.c:79:29: error: got {

Acked-by: Corey Minyard <cminyard@mvista.com>
Cc: Corey Minyard <minyard@acm.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-25 13:44:20 +01:00
David S. Miller
8ec59b44a0 Merge branch 'rebased-net-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24 23:48:11 -05:00
David S. Miller
955bd1d216 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24 23:44:15 -05:00
Linus Torvalds
5b7d27967d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Avoid negative netdev refcount in error flow of xfrm state add, from
    Aviad Yehezkel.

 2) Fix tcpdump decoding of IPSEC decap'd frames by filling in the
    ethernet header protocol field in xfrm{4,6}_mode_tunnel_input().
    From Yossi Kuperman.

 3) Fix a syzbot triggered skb_under_panic in pppoe having to do with
    failing to allocate an appropriate amount of headroom. From
    Guillaume Nault.

 4) Fix memory leak in vmxnet3 driver, from Neil Horman.

 5) Cure out-of-bounds packet memory access in em_nbyte EMATCH module,
    from Wolfgang Bumiller.

 6) Restrict what kinds of sockets can be bound to the KCM multiplexer
    and also disallow when another layer has attached to the socket and
    made use of sk_user_data. From Tom Herbert.

 7) Fix use before init of IOTLB in vhost code, from Jason Wang.

 8) Correct STACR register write bit definition in IBM emac driver, from
    Ivan Mikhaylov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net/ibm/emac: wrong bit is used for STA control register write
  net/ibm/emac: add 8192 rx/tx fifo size
  vhost: do not try to access device IOTLB when not initialized
  vhost: use mutex_lock_nested() in vhost_dev_lock_vqs()
  i40e: flower: check if TC offload is enabled on a netdev
  qed: Free reserved MR tid
  qed: Remove reserveration of dpi for kernel
  kcm: Check if sk_user_data already set in kcm_attach
  kcm: Only allow TCP sockets to be attached to a KCM mux
  net: sched: fix TCF_LAYER_LINK case in tcf_get_base_ptr
  net: sched: em_nbyte: don't add the data offset twice
  mlxsw: spectrum_router: Don't log an error on missing neighbor
  vmxnet3: repair memory leak
  ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
  pppoe: take ->needed_headroom of lower device into account on xmit
  xfrm: fix boolean assignment in xfrm_get_type_offload
  xfrm: Fix eth_hdr(skb)->h_proto to reflect inner IP version
  xfrm: fix error flow in case of add state fails
  xfrm: Add SA to hardware at the end of xfrm_state_construct()
2018-01-24 17:24:30 -08:00
Al Viro
5c59e564e4 kill kernel_sock_ioctl()
no users since 2014

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24 19:13:45 -05:00
Al Viro
44c02a2c3d dev_ioctl(): move copyin/copyout to callers
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24 19:13:45 -05:00
Al Viro
b1b0c24506 lift handling of SIOCIW... out of dev_ioctl()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24 19:13:45 -05:00
Al Viro
ca25c30040 ip_rt_ioctl(): take copyin to caller
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24 19:13:45 -05:00
Al Viro
03aef17bb7 devinet_ioctl(): take copyin/copyout to caller
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24 19:13:45 -05:00
Al Viro
36fd633ec9 net: separate SIOCGIFCONF handling from dev_ioctl()
Only two of dev_ioctl() callers may pass SIOCGIFCONF to it.
Separating that codepath from the rest of dev_ioctl() allows both
to simplify dev_ioctl() itself (all other cases work with struct ifreq *)
*and* seriously simplify the compat side of that beast: all it takes
is passing to inet_gifconf() an extra argument - the size of individual
records (sizeof(struct ifreq) or sizeof(struct compat_ifreq)).  With
dev_ifconf() called directly from sock_do_ioctl()/compat_dev_ifconf()
that's easy to arrange.

As the result, compat side of SIOCGIFCONF doesn't need any
allocations, copy_in_user() back and forth, etc.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24 19:13:45 -05:00
William Tu
b423d13c08 net: erspan: fix use-after-free
When building the erspan header for either v1 or v2, the eth_hdr()
does not point to the right inner packet's eth_hdr,
causing kasan report use-after-free and slab-out-of-bouds read.

The patch fixes the following syzkaller issues:
[1] BUG: KASAN: slab-out-of-bounds in erspan_xmit+0x22d4/0x2430 net/ipv4/ip_gre.c:735
[2] BUG: KASAN: slab-out-of-bounds in erspan_build_header+0x3bf/0x3d0 net/ipv4/ip_gre.c:698
[3] BUG: KASAN: use-after-free in erspan_xmit+0x22d4/0x2430 net/ipv4/ip_gre.c:735
[4] BUG: KASAN: use-after-free in erspan_build_header+0x3bf/0x3d0 net/ipv4/ip_gre.c:698

[2] CPU: 0 PID: 3654 Comm: syzkaller377964 Not tainted 4.15.0-rc9+ #185
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:53
 print_address_description+0x73/0x250 mm/kasan/report.c:252
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x25b/0x340 mm/kasan/report.c:409
 __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:440
 erspan_build_header+0x3bf/0x3d0 net/ipv4/ip_gre.c:698
 erspan_xmit+0x3b8/0x13b0 net/ipv4/ip_gre.c:740
 __netdev_start_xmit include/linux/netdevice.h:4042 [inline]
 netdev_start_xmit include/linux/netdevice.h:4051 [inline]
 packet_direct_xmit+0x315/0x6b0 net/packet/af_packet.c:266
 packet_snd net/packet/af_packet.c:2943 [inline]
 packet_sendmsg+0x3aed/0x60b0 net/packet/af_packet.c:2968
 sock_sendmsg_nosec net/socket.c:638 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:648
 SYSC_sendto+0x361/0x5c0 net/socket.c:1729
 SyS_sendto+0x40/0x50 net/socket.c:1697
 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
 do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
 entry_SYSENTER_compat+0x54/0x63 arch/x86/entry/entry_64_compat.S:129
RIP: 0023:0xf7fcfc79
RSP: 002b:00000000ffc6976c EFLAGS: 00000286 ORIG_RAX: 0000000000000171
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000000020011000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020008000
RBP: 000000000000001c R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Fixes: f551c91de2 ("net: erspan: introduce erspan v2 for ip_gre")
Fixes: 84e54fe0a5 ("gre: introduce native tunnel support for ERSPAN")
Reported-by: syzbot+9723f2d288e49b492cf0@syzkaller.appspotmail.com
Reported-by: syzbot+f0ddeb2b032a8e1d9098@syzkaller.appspotmail.com
Reported-by: syzbot+f14b3703cd8d7670203f@syzkaller.appspotmail.com
Reported-by: syzbot+eefa384efad8d7997f20@syzkaller.appspotmail.com
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24 16:53:17 -05:00
Jakub Kicinski
c846adb6be net: sched: remove tc_cls_common_offload_init_deprecated()
All users are now converted to tc_cls_common_offload_init().

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24 16:01:11 -05:00
Jakub Kicinski
f558fdea03 cls_bpf: remove gen_flags from bpf_offload
cls_bpf now guarantees that only device-bound programs are
allowed with skip_sw.  The drivers no longer pay attention to
flags on filter load, therefore the bpf_offload member can be
removed.  If flags are needed again they should probably be
added to struct tc_cls_common_offload instead.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24 16:01:10 -05:00
Jakub Kicinski
34832e1c70 net: sched: prepare for reimplementation of tc_cls_common_offload_init()
Rename the tc_cls_common_offload_init() helper function to
tc_cls_common_offload_init_deprecated() and add a new implementation
which also takes flags argument.  We will only set extack if flags
indicate that offload is forced (skip_sw) otherwise driver errors
should be ignored, as they don't influence the overall filter
installation.

Note that we need the tc_skip_hw() helper for new version, therefore
it is added later in the file.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24 16:01:10 -05:00
Jakub Kicinski
715df5ecab net: sched: propagate extack to cls->destroy callbacks
Propagate extack to cls->destroy callbacks when called from
non-error paths.  On error paths pass NULL to avoid overwriting
the failure message.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24 16:01:09 -05:00
Wolfgang Bumiller
d3303a65a0 net: sched: fix TCF_LAYER_LINK case in tcf_get_base_ptr
TCF_LAYER_LINK and TCF_LAYER_NETWORK returned the same pointer as
skb->data points to the network header.
Use skb_mac_header instead.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24 14:52:48 -05:00
Linus Torvalds
03fae44b41 Merge tag 'trace-v4.15-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
 "With the new ORC unwinder, ftrace stack tracing became disfunctional.

  One was that ORC didn't know how to handle the ftrace callbacks in
  general (which Josh fixed).

  The other was that ORC would just bail if it hit a dynamically
  allocated trampoline. Which means all ftrace stack tracing that
  happens from the function tracer would produce no results (that
  includes killing the max stack size tracer). I added a check to the
  ORC unwinder to see if the trampoline belonged to ftrace, and if it
  did, use the orc entry of the static trampoline that was used to
  create the dynamic one (it would be identical).

  Finally, I noticed that the skip values of the stack tracing were out
  of whack. I went through and fixed them up"

* tag 'trace-v4.15-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Update stack trace skipping for ORC unwinder
  ftrace, orc, x86: Handle ftrace dynamically allocated trampolines
  x86/ftrace: Fix ORC unwinding from ftrace handlers
2018-01-24 10:08:16 -08:00
Greg Kroah-Hartman
5132ede0fe Revert "module: Add retpoline tag to VERMAGIC"
This reverts commit 6cfb521ac0.

Turns out distros do not want to make retpoline as part of their "ABI",
so this patch should not have been merged.  Sorry Andi, this was my
fault, I suggested it when your original patch was the "correct" way of
doing this instead.

Reported-by: Jiri Kosina <jikos@kernel.org>
Fixes: 6cfb521ac0 ("module: Add retpoline tag to VERMAGIC")
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: rusty@rustcorp.com.au
Cc: arjan.van.de.ven@intel.com
Cc: jeyu@kernel.org
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-24 09:00:05 -08:00
Kuninori Morimoto
72c3818411 ASoC: soc-pcm: rename .pmdown_time to .use_pmdown_time for Component
commit fbb16563c6 ("ASoC: snd_soc_component_driver has pmdown_time")
added new .pmdown_time which is for inverted version of current
.ignore_pmdown_time
But it is confusable name. Let's rename it to .use_pmdown_time

Reported-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2018-01-24 16:44:22 +00:00
Miklos Szeredi
f9c34674bc vfs: factor out helpers d_instantiate_anon() and d_alloc_anon()
Those helpers are going to be used by overlayfs to implement
NFS export decode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:59 +01:00
Peter Zijlstra
ce48c14649 sched/core: Fix cpu.max vs. cpuhotplug deadlock
Tejun reported the following cpu-hotplug lock (percpu-rwsem) read recursion:

  tg_set_cfs_bandwidth()
    get_online_cpus()
      cpus_read_lock()

    cfs_bandwidth_usage_inc()
      static_key_slow_inc()
        cpus_read_lock()

Reported-by: Tejun Heo <tj@kernel.org>
Tested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180122215328.GP3397@worktop
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-24 10:03:44 +01:00
Ben Hutchings
e9191ffb65 ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
Commit 513674b5a2 ("net: reevalulate autoflowlabel setting after
sysctl setting") removed the initialisation of
ipv6_pinfo::autoflowlabel and added a second flag to indicate
whether this field or the net namespace default should be used.

The getsockopt() handling for this case was not updated, so it
currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is
not explicitly enabled.  Fix it to return the effective value, whether
that has been set at the socket or net namespace level.

Fixes: 513674b5a2 ("net: reevalulate autoflowlabel setting after sysctl ...")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23 19:53:24 -05:00
Davide Caratti
9c5f69bbd7 net/sched: act_csum: don't use spinlock in the fast path
use RCU instead of spin_{,unlock}_bh() to protect concurrent read/write on
act_csum configuration, to reduce the effects of contention in the data
path when multiple readers are present.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23 19:51:46 -05:00
Frederic Barrat
280b983ce2 ocxl: Add a kernel API for other opencapi drivers
Some of the functions done by the generic driver should also be needed
by other opencapi drivers: attaching a context to an adapter,
translation fault handling, AFU interrupt allocation...

So to avoid code duplication, the driver provides a kernel API that
other drivers can use, similar to calling a in-kernel library.

It is still a bit theoretical, for lack of real hardware, and will
likely need adjustements down the road. But we used the cxlflash
driver as a guinea pig.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24 11:42:59 +11:00
Frederic Barrat
aeddad1760 ocxl: Add AFU interrupt support
Add user APIs through ioctl to allocate, free, and be notified of an
AFU interrupt.

For opencapi, an AFU can trigger an interrupt on the host by sending a
specific command targeting a 64-bit object handle. On POWER9, this is
implemented by mapping a special page in the address space of a
process and a write to that page will trigger an interrupt.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24 11:42:58 +11:00
Frederic Barrat
5ef3166e8a ocxl: Driver code for 'generic' opencapi devices
Add an ocxl driver to handle generic opencapi devices. Of course, it's
not meant to be the only opencapi driver, any device is free to
implement its own. But if a host application only needs basic services
like attaching to an opencapi adapter, have translation faults handled
or allocate AFU interrupts, it should suffice.

The AFU config space must follow the opencapi specification and use
the expected vendor/device ID to be seen by the generic driver.

The driver exposes the device AFUs as a char device in /dev/ocxl/

Note that the driver currently doesn't handle memory attached to the
opencapi device.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24 11:42:58 +11:00
Frederic Barrat
2cb3d64b26 powerpc/powernv: Capture actag information for the device
In the opencapi protocol, host memory contexts are referenced by a
'actag'. During setup, a driver must tell the device how many actags
it can used, and what values are acceptable.

On POWER9, the NPU can handle 64 actags per link, so they must be
shared between all the PCI functions of the link. To get a global
picture of how many actags are used by each AFU of every function, we
capture some data at the end of PCI enumeration, so that actags can be
shared fairly if needed.

This is not powernv specific per say, but rather a consequence of the
opencapi configuration specification being quite general. The number
of available actags on POWER9 makes it more likely to be hit. This is
somewhat mitigated by the fact that existing AFUs are coded by
requesting a reasonable count of actags and existing devices carry
only one AFU.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24 11:42:57 +11:00
Steven Rostedt (VMware)
6be7fa3c74 ftrace, orc, x86: Handle ftrace dynamically allocated trampolines
The function tracer can create a dynamically allocated trampoline that is
called by the function mcount or fentry hook that is used to call the
function callback that is registered. The problem is that the orc undwinder
will bail if it encounters one of these trampolines. This breaks the stack
trace of function callbacks, which include the stack tracer and setting the
stack trace for individual functions.

Since these dynamic trampolines are basically copies of the static ftrace
trampolines defined in ftrace_*.S, we do not need to create new orc entries
for the dynamic trampolines. Finding the return address on the stack will be
identical as the functions that were copied to create the dynamic
trampolines. When encountering a ftrace dynamic trampoline, we can just use
the orc entry of the ftrace static function that was copied for that
trampoline.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23 15:56:55 -05:00
Jay Cornwall
430a23689d PCI: Add pci_enable_atomic_ops_to_root()
The Atomic Operations feature (PCIe r4.0, sec 6.15) allows atomic
transctions to be requested by, routed through and completed by PCIe
components. Routing and completion do not require software support.
Component support for each is detectable via the DEVCAP2 register.

A Requester may use AtomicOps only if its PCI_EXP_DEVCTL2_ATOMIC_REQ is
set. This should be set only if the Completer and all intermediate routing
elements support AtomicOps.

A concrete example is the AMD Fiji-class GPU (which is capable of making
AtomicOp requests), below a PLX 8747 switch (advertising AtomicOp routing)
with a Haswell host bridge (advertising AtomicOp completion support).

Add pci_enable_atomic_ops_to_root() for per-device control over AtomicOp
requests. This checks to be sure the Root Port supports completion of the
desired AtomicOp sizes and the path to the Root Port supports routing the
AtomicOps.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[bhelgaas: changelog, comments, whitespace]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-01-23 14:46:50 -06:00
Trond Myklebust
8f39fce84a Merge tag 'nfs-rdma-for-4.16-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
NFS-over-RDMA client updates for Linux 4.16

New features:
- xprtrdma tracepoints

Bugfixes and cleanups:
- Fix memory leak if rpcrdma_buffer_create() fails
- Fix allocating extra rpcrdma_reps for the backchannel
- Remove various unused and redundant variables and lock cycles
- Fix IPv6 support in xprt_rdma_set_port()
- Fix memory leak by calling buf_free for callback replies
- Fix "bytes registered" accounting
- Fix kernel-doc comments
- SUNRPC tracepoint cleanups for consistent information
- Optimizations for __rpc_execute()
2018-01-23 14:55:50 -05:00
David S. Miller
5ca114400d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
en_rx_am.c was deleted in 'net-next' but had a bug fixed in it in
'net'.

The esp{4,6}_offload.c conflicts were overlapping changes.
The 'out' label is removed so we just return ERR_PTR(-EINVAL)
directly.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23 13:51:56 -05:00
Benjamin Coddington
0be283f676 SUNRPC: Fix null rpc_clnt dereference in rpc_task_queued tracepoint
Backchannel tasks will not have a reference to the rpc_clnt.  Return -1 for
cl_clid in that case.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trondmy@gmail.com>
2018-01-23 13:18:11 -05:00