Currently block plug holds up to 16 non-mergeable requests. This makes
sense if the request size is small, eg, reduce lock contention. But if
request size is big enough, we don't need to worry about lock
contention. Holding such request makes no sense and it lows the disk
utilization.
In practice, this improves 10% throughput for my raid5 sequential write
workload.
The size (128k) is arbitrary right now, but it makes sure lock
contention is small. This probably could be more intelligent, eg, check
average request size holded. Since this is mainly for sequential IO,
probably not worthy.
V2: check the last request instead of the first request, so as long as
there is one big size request we flush the plug.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The validation for it ends up being quite simple, but I hadn't got
around to it before merging the driver. For backwards compatibility,
we also need to add a flag so that the userspace GL driver can easily
tell if the kernel will allow ETC1 textures (on an old kernel, it will
continue to convert to RGBA8)
Signed-off-by: Eric Anholt <eric@anholt.net>
Support matching on SCTP ports in the same way that matching
on TCP and UDP ports is already supported.
Example usage:
tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: \
flower indev eth0 ip_proto sctp dst_port 80 \
action drop
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some configurations (e.g. geneve interface with default
MTU of 1500 over an ethernet interface with 1500 MTU) result
in the transmission of packets that exceed the configured MTU.
While this should be considered to be a "bad" configuration,
it is still allowed and should not result in the sending
of packets that exceed the configured MTU.
Fix by dropping the assumption in ip_finish_output_gso() that
locally originated gso packets will never need fragmentation.
Basic testing using iperf (observing CPU usage and bandwidth)
have shown no measurable performance impact for traffic not
requiring fragmentation.
Fixes: c7ba65d7b6 ("net: ip: push gso skb forwarding handling down the stack")
Reported-by: Jan Tluka <jtluka@redhat.com>
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When reading a datagram or raw packet that arrived fragmented, expose
the maximum fragment size if recorded to allow applications to
estimate receive path MTU.
At this point, the field is only recorded when ipv6 connection
tracking is enabled. A follow-up patch will record this field also
in the ipv6 input path.
Tested using the test for IP_RECVFRAGSIZE plus
ip netns exec to ip addr add dev veth1 fc07::1/64
ip netns exec from ip addr add dev veth0 fc07::2/64
ip netns exec to ./recv_cmsg_recvfragsize -6 -u -p 6000 &
ip netns exec from nc -q 1 -u fc07::1 6000 < payload
Both with and without enabling connection tracking
ip6tables -A INPUT -m state --state NEW -p udp -j LOG
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IP stack records the largest fragment of a reassembled packet
in IPCB(skb)->frag_max_size. When reading a datagram or raw packet
that arrived fragmented, expose the value to allow applications to
estimate receive path MTU.
Tested:
Sent data over a veth pair of which the source has a small mtu.
Sent data using netcat, received using a dedicated process.
Verified that the cmsg IP_RECVFRAGSIZE is returned only when
data arrives fragmented, and in that cases matches the veth mtu.
ip link add veth0 type veth peer name veth1
ip netns add from
ip netns add to
ip link set dev veth1 netns to
ip netns exec to ip addr add dev veth1 192.168.10.1/24
ip netns exec to ip link set dev veth1 up
ip link set dev veth0 netns from
ip netns exec from ip addr add dev veth0 192.168.10.2/24
ip netns exec from ip link set dev veth0 up
ip netns exec from ip link set dev veth0 mtu 1300
ip netns exec from ethtool -K veth0 ufo off
dd if=/dev/zero bs=1 count=1400 2>/dev/null > payload
ip netns exec to ./recv_cmsg_recvfragsize -4 -u -p 6000 &
ip netns exec from nc -q 1 -u 192.168.10.1 6000 < payload
using github.com/wdebruij/kerneltools/blob/master/tests/recvfragsize.c
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not all use cases can support Doi3. Only certain use cases like hot word
detection, deep buffering can support D0i3 based on resource requirement.
So, pass the D0i3 capability for the FE/BE copier using topology. This will
be used to take a decision for D0i3 mode entry/exit.
Signed-off-by: Jayachandran B <jayachandran.b@intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
For D0i3, we need to tell DSP to run the pipelines in LP mode. This
information is kept in topology and passed to driver as an attribute
for pipe.
So add a new tuple for lpmode and program the pipe based on value set.
Signed-off-by: Jayachandran B <jayachandran.b@intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
This is the remaining update to PCM ABI object of version 5.
The flags will be applied to FE (Front End) links and can also be used
by physical links. The private data is reserved for future extension, so
offset update will add the private data size.
Now user space is using ABI v4, and the previous patch "ASoC: topology:
make PCM backward compatible from ABI v4" can assure the backward
compatibility.
Signed-off-by: Mengdong Lin <mengdong.lin@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Users start to use topology ABI from v4. ABI v5 updated existing manifest
and PCM elements. Two previous patches can support these ABI updates in a
backward compatible way. So if the topology file from user space is
generated by ABI v4, kernel will no longer quit but continue parsing.
Signed-off-by: Mengdong Lin <mengdong.lin@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
This field is only useful for nf_queue, so store it in the
nf_queue_entry structure instead, away from the core path. Pass
hook_head to nf_hook_slow().
Since we always have a valid entry on the first iteration in
nf_iterate(), we can use 'do { ... } while (entry)' loop instead.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Don't copy relevant fields from hook state structure, instead use the
one that is already available in struct xt_action_param.
This patch also adds a set of new wrapper functions to fetch relevant
hook state structure fields.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Place pointer to hook state in xt_action_param structure instead of
copying the fields that we need. After this change xt_action_param fits
into one cacheline.
This patch also adds a set of new wrapper functions to fetch relevant
hook state structure fields.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
NF_STOP is only used by br_netfilter these days, and it can be emulated
with a combination of NF_STOLEN plus explicit call to the ->okfn()
function as Florian suggests.
To retain binary compatibility with userspace nf_queue application, we
have to keep NF_STOP around, so libnetfilter_queue userspace userspace
applications still work if they use NF_STOP for some exotic reason.
Out of tree modules using NF_STOP would break, but we don't care about
those.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patch c5136b15ea ("netfilter: bridge: add and use br_nf_hook_thresh")
introduced br_nf_hook_thresh().
Replace NF_HOOK_THRESH() by br_nf_hook_thresh from
br_nf_forward_finish(), so we have no more callers for this macro.
As a result, state->thresh and explicit thresh parameter in the hook
state structure is not required anymore. And we can get rid of
skip-hook-under-thresh loop in nf_iterate() in the core path that is
only used by br_netfilter to search for the filter hook.
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Now that we have a helper to gather periodic
endpoints' multiplier bits from wMaxPacketSize and
every driver is using it, we can safely make sure
that usb_endpoint_maxp() returns only bits 10:0 of
wMaxPacketSize which is where the actual packet size
lies.
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Add a driver for the Renesas R-Car Gen1 RESET/WDT and R-Car Gen2/Gen3
and RZ/G RST module.
For now this driver just provides an API to obtain the state of the mode
pins, as latched at reset time. As this is typically called from the
probe function of a clock driver, which can run much earlier than any
initcall, calling rcar_rst_read_mode_pins() just forces an early
initialization of the driver.
Despite the current simple and almost identical handling for all
supported SoCs, the driver matches against SoC-specific compatible
values, as the features provided by the hardware module differ a lot
across the various SoC families and members.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Dirk Behme <dirk.behme@de.bosch.com>
skb->cb may contain data from previous layers. In the observed scenario,
the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so
that small packets sent through the tunnel are mistakenly fragmented.
This patch unconditionally clears the control buffer in ip6tunnel_xmit(),
which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of
these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier.
Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The old ethtool api (get_setting and set_setting) has generic mii
functions mii_ethtool_sset and mii_ethtool_gset.
To support the new ethtool api ({get|set}_link_ksettings), we add
two generics mii function mii_ethtool_{get|set}_link_ksettings_get.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next
tree. This includes better integration with the routing subsystem for
nf_tables, explicit notrack support and smaller updates. More
specifically, they are:
1) Add fib lookup expression for nf_tables, from Florian Westphal. This
new expression provides a native replacement for iptables addrtype
and rp_filter matches. This is more flexible though, since we can
populate the kernel flowi representation to inquire fib to
accomodate new usecases, such as RTBH through skb mark.
2) Introduce rt expression for nf_tables, from Anders K. Pedersen. This
new expression allow you to access skbuff route metadata, more
specifically nexthop and classid fields.
3) Add notrack support for nf_tables, to skip conntracking, requested by
many users already.
4) Add boilerplate code to allow to use nf_log infrastructure from
nf_tables ingress.
5) Allow to mangle pkttype from nf_tables prerouting chain, to emulate
the xtables cluster match, from Liping Zhang.
6) Move socket lookup code into generic nf_socket_* infrastructure so
we can provide a native replacement for the xtables socket match.
7) Make sure nfnetlink_queue data that is updated on every packets is
placed in a different cache from read-only data, from Florian Westphal.
8) Handle NF_STOLEN from nf_tables core, also from Florian Westphal.
9) Start round robin number generation in nft_numgen from zero,
instead of n-1, for consistency with xtables statistics match,
patch from Liping Zhang.
10) Set GFP_NOWARN flag in skbuff netlink allocations in nfnetlink_log,
given we retry with a smaller allocation on failure, from Calvin Owens.
11) Cleanup xt_multiport to use switch(), from Gao feng.
12) Remove superfluous check in nft_immediate and nft_cmp, from
Liping Zhang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Most blk_mq_requeue_request() and blk_mq_add_to_requeue_list() calls
are followed by kicking the requeue list. Hence add an argument to
these two functions that allows to kick the requeue list. This was
proposed by Christoph Hellwig.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
have finished. This function does *not* wait until all outstanding
requests have finished (this means invocation of request.end_io()).
The algorithm used by blk_mq_quiesce_queue() is as follows:
* Hold either an RCU read lock or an SRCU read lock around
.queue_rq() calls. The former is used if .queue_rq() does not
block and the latter if .queue_rq() may block.
* blk_mq_quiesce_queue() first calls blk_mq_stop_hw_queues()
followed by synchronize_srcu() or synchronize_rcu(). The latter
call waits for .queue_rq() invocations that started before
blk_mq_quiesce_queue() was called.
* The blk_mq_hctx_stopped() calls that control whether or not
.queue_rq() will be called are called with the (S)RCU read lock
held. This is necessary to avoid race conditions against
blk_mq_quiesce_queue().
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
The function blk_queue_stopped() allows to test whether or not a
traditional request queue has been stopped. Introduce a helper
function that allows block drivers to query easily whether or not
one or more hardware contexts of a blk-mq queue have been stopped.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
This is a helper that pins down a range from an iov_iter and adds it to
a bio without requiring a separate memory allocation for the page array.
It will be used for upcoming direct I/O implementations for block devices
and iomap based file systems.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[hch: ported to the iov_iter interface, renamed and added comments.
All blame should be directed to me and all fame should go to Kent
after this!]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
If we're doing background type writes, then use the appropriate
background write flags for that.
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Add wbc_to_write_flags(), which returns the write modifier flags to use,
based on a struct writeback_control. No functional changes in this
patch, but it prepares us for factoring other wbc fields for write type.
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
This adds a new request flag, REQ_BACKGROUND, that callers can use to
tell the block layer that this is background (non-urgent) IO.
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
A DAPM_DOUBLE_R control type can be used for dual channel mixer input
selectors / mute controls across 2 registers, possibly toggling both
channels together.
The control is meant to be shared by 2 widgets, 1 for each channel,
such that the mixer control exposed to userspace remains a combined
stereo control.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
A DAPM_DOUBLE control type can be used for dual channel mixer input
selectors / mute controls in one register, possibly toggling both
channels together.
The control is meant to be shared by 2 widgets, 1 for each channel,
such that the mixer control exposed to userspace remains a combined
stereo control.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
To support double channel shared controls split across 2 registers, one
for each channel, we must be able to update both registers together.
Add a second set of register fields to struct snd_soc_dapm_update, and
update the DAPM control writeback (put) callbacks to support this.
For codecs that use custom events which call into DAPM to do updates,
also clear struct snd_soc_dapm_update before using it, so the second
set of fields remains clean.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Introduces an nftables rt expression for routing related data with support
for nexthop (i.e. the directly connected IP address that an outgoing packet
is sent to), which can be used either for matching or accounting, eg.
# nft add rule filter postrouting \
ip daddr 192.168.1.0/24 rt nexthop != 192.168.0.1 drop
This will drop any traffic to 192.168.1.0/24 that is not routed via
192.168.0.1.
# nft add rule filter postrouting \
flow table acct { rt nexthop timeout 600s counter }
# nft add rule ip6 filter postrouting \
flow table acct { rt nexthop timeout 600s counter }
These rules count outgoing traffic per nexthop. Note that the timeout
releases an entry if no traffic is seen for this nexthop within 10 minutes.
# nft add rule inet filter postrouting \
ether type ip \
flow table acct { rt nexthop timeout 600s counter }
# nft add rule inet filter postrouting \
ether type ip6 \
flow table acct { rt nexthop timeout 600s counter }
Same as above, but via the inet family, where the ether type must be
specified explicitly.
"rt classid" is also implemented identical to "meta rtclassid", since it
is more logical to have this match in the routing expression going forward.
Signed-off-by: Anders K. Pedersen <akp@cohaesio.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Move layer 2 packet logging into nf_log_l2packet() that resides in
nf_log_common.c, so this can be shared by both bridge and netdev
families.
This patch adds the boiler plate code to register the netdev logging
family.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add FIB expression, supported for ipv4, ipv6 and inet family (the latter
just dispatches to ipv4 or ipv6 one based on nfproto).
Currently supports fetching output interface index/name and the
rtm_type associated with an address.
This can be used for adding path filtering. rtm_type is useful
to e.g. enforce a strong-end host model where packets
are only accepted if daddr is configured on the interface the
packet arrived on.
The fib expression is a native nftables alternative to the
xtables addrtype and rp_filter matches.
FIB result order for oif/oifname retrieval is as follows:
- if packet is local (skb has rtable, RTF_LOCAL set, this
will also catch looped-back multicast packets), set oif to
the loopback interface.
- if fib lookup returns an error, or result points to local,
store zero result. This means '--local' option of -m rpfilter
is not supported. It is possible to use 'fib type local' or add
explicit saddr/daddr matching rules to create exceptions if this
is really needed.
- store result in the destination register.
In case of multiple routes, search set for desired oif in case
strict matching is requested.
ipv4 and ipv6 behave fib expressions are supposed to behave the same.
[ I have collapsed Arnd Bergmann's ("netfilter: nf_tables: fib warnings")
http://patchwork.ozlabs.org/patch/688615/
to address fallout from this patch after rebasing nf-next, that was
posted to address compilation warnings. --pablo ]
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Now that we have a separate header for struct bio_vec there is absolutely
no excuse for including this header from non-block I/O code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
It's only needed for the CONFIG_SWAP-only use of bio_end_io_t.
Because CONFIG_SWAP implies CONFIG_BLOCK this will allow to drop some
ifdefs in blk_types.h.
Instead we'll need to add a few explicit includes that were implicit
before, though.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
The file only needs the struct bvec_iter delcaration, which is available
from bvec.h.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>