It is easy to read code if it is cleanly using paired function/naming,
like start <-> stop, register <-> unregister, etc, etc.
But, current ALSA SoC code is very random, unbalance, not paired, etc.
It is easy to create bug at the such code, and is difficult to debug.
Now ALSA SoC has snd_soc_add_component(), but there is no paired
snd_soc_del_component(). Thus, snd_soc_unregister_component() is
calling cleanup function randomly. it is difficult to read.
This patch adds missing snd_soc_del_component_unlocked() and
balance up code.
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://lore.kernel.org/r/8736f23jn4.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>
It is easy to read code if it is cleanly using paired function/naming,
like start <-> stop, register <-> unregister, etc, etc.
But, current ALSA SoC code is very random, unbalance, not paired, etc.
It is easy to create bug at the such code, and it will be difficult to
debug.
ALSA SoC has soc_bind_dai_link(), but its paired soc_unbind_dai_link()
is not implemented.
More confusable is that soc_remove_pcm_runtimes() which should be
soc_unbind_dai_link() is implemented without synchronised
to soc_bind_dai_link().
This patch cleanup this unbalance.
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://lore.kernel.org/r/877e4e3jni.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>
If we focus to soc_bind_dai_link() at snd_soc_instantiate_card(),
we will notice very complex operation.
static int snd_soc_instantiate_card(...)
{
...
/*
* (1) Bind dai_link via card pre-linked dai_link
*
* Bind dai_link via card pre-linked.
* 1 dai_link will be 1 rtd, and connected to card.
* for_each_card_prelinks() is for card pre-linked dai_link.
*
* Image
*
* card
* - rtd(A)
* - rtd(A)
*/
for_each_card_prelinks(card, i, dai_link) {
ret = soc_bind_dai_link(card, dai_link);
...
}
...
/*
* (2) Connect card pre-linked dai_link to card list
*
* Connect all card pre-linked dai_link to *card list*.
* Here, (A) means from card pre-linked.
*
* Image
*
* card card list
* - rtd(A) - dai_link(A)
* - rtd(A) - dai_link(A)
* - ... - ...
*/
for_each_card_prelinks(card, i, dai_link) {
ret = snd_soc_add_dai_link(card, dai_link);
...
}
...
/*
* (3) Probe binded component
*
* Each rtd has many components.
* Here probes each rtd connected components.
* rtd(A) in Image is the probe target.
*
* During this component probe, topology may add new dai_link to
* *card list* by using snd_soc_add_dai_link() which is
* used at (2).
* Here, (B) means from topology
*
* Image
*
* card card list
* - rtd(A) - dai_link(A)
* - rtd(A) - dai_link(A)
* - ... - ...
* - dai_link(B)
* - dai_link(B)
*/
ret = soc_probe_link_components(card);
...
/*
* (4) Bind dai_link again
*
* Bind dai_link again for topology.
* Note, (1) used for_each_card_prelinks(),
* here is using for_each_card_links()
*
* This means from card list.
* As Image indicating, it has dai_link(A) (from card pre-link)
* and dai_link(B) (from topology).
* main target here is dai_link(B).
* soc_bind_dai_link() ignores already used
* dai_link (= dai_link(A))
*
* Image
*
* card card list
* - rtd(A) - dai_link(A)
* - rtd(A) - dai_link(A)
* - ... - ...
* - rtd(B) - dai_link(B)
* - rtd(B) - dai_link(B)
*/
for_each_card_links(card, dai_link) {
ret = soc_bind_dai_link(card, dai_link);
...
}
...
}
As you see above, it is doing very complex method.
The problem is binding dai_link via "card pre-linked" (= (1)) and
"topology added dai_link" (= (3)) are separated.
The code can be simple if we can bind dai_link when dai_link
is connected to *card list*.
This patch do it.
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://lore.kernel.org/r/878sou3jnn.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>
soc_is_dai_link_bound() check will be called both
*before* soc_bind_dai_link() (A), and
*under* soc_bind_dai_link() (B).
These are very verbose code. Let's remove one of them.
* static int soc_bind_dai_link(...)
{
...
(B) if (soc_is_dai_link_bound(...)) {
...
return 0;
}
...
}
static int snd_soc_instantiate_card(...)
{
...
for_each_card_links(...) {
(A) if (soc_is_dai_link_bound(...))
continue;
* ret = soc_bind_dai_link(...);
if (ret)
goto probe_end;
}
...
}
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://lore.kernel.org/r/87a79a3jns.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>
soc_init_dai_link() is needed to be called before soc_bind_dai_link().
int snd_soc_instantiate_card()
{
for_each_card_prelinks(...) {
(1) ret = soc_init_dai_link(...);
...
}
...
for_each_card_prelinks(...) {
(2) ret = soc_bind_dai_link(...);
...
}
...
for_each_card_links(...) {
...
(A) ret = soc_init_dai_link(...);
...
(B) ret = soc_bind_dai_link(...);
}
...
(1) is for (2), and (A) is for (B)
(1) and (2) are for card prelink dai_link.
(A) and (B) are for topology added dai_link.
soc_init_dai_link() is sanity check for dai_link, not initializing today.
Therefore, it is confusable naming. We can rename it as sanity_check.
And this check is for soc_bind_dai_link().
It can be more simple code if we can call it from soc_bind_dai_link().
This patch renames it to soc_dai_link_sanity_check(), and
call it from soc_bind_dai_link().
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://lore.kernel.org/r/87d0e63joh.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The ADAU7118 has an example where the codec has an i2c address of 14, and
the unit address set to 14 as well.
However, while the address is expressed in decimal, the unit-address is
supposed to be in hexadecimal, which ends up with two different addresses
that trigger a DTC warning. Fix this by setting the address to 0x14.
Cc: Nuno Sá <nuno.sa@analog.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Fixes: 969d49b2cd ("dt-bindings: asoc: Add ADAU7118 documentation")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Acked-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20191105105615.21391-1-maxime@cerno.tech
Signed-off-by: Mark Brown <broonie@kernel.org>
Convert Samsung S3C/S5P/Exynos Serial/UART bindings to DT schema format
using json-schema.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
The Allwinner SoCs have an mUSB controller that is supported in Linux, with
a matching Device Tree binding.
Now that we have the DT validation in place, let's convert the device tree
bindings for that controller over to a YAML schemas.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Signed-off-by: Rob Herring <robh@kernel.org>
Pull NVMe fixes from Keith:
"We have a few late nvme fixes for a couple device removal kernel
crashes, and a compat fix for a new ioctl introduced during this merge
window."
* 'nvme-5.4-rc7' of git://git.infradead.org/nvme:
nvme: change nvme_passthru_cmd64 to explicitly mark rsvd
nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths
nvme-rdma: fix a segmentation fault during module unload
As introduced by commit:
ba816ad61f ("io_uring: run dependent links inline if possible")
enable inline dependent link running for poll commands.
io_poll_complete_work() is the most important change, as it allows a
linked sequence of { POLL, READ } (for example) to proceed inline
instead of needing to get punted to another async context. The
submission side only potentially matters for sqthread, but may as well
include that bit.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is no need to treat string arrays and single strings separately, we can go
exclusively by the element length in relation to data type size.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is absolutely no reason to have them as we can handle it all nicely in
property_entry_read_int_array().
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Instead of explicitly setting values of integer types when copying
property entries lets just copy entire value union when processing
non-array values.
For value arrays we no longer use union of pointers, but rather a single
void pointer, which allows us to remove property_set_pointer().
In property_get_pointer() we do not need to handle each data type
separately, we can simply return either the pointer or pointer to values
union.
We are not losing anything from removing typed pointer union because the
upper layers do their accesses through void pointers anyway, and we
trust the "type" of the property when interpret the data. We rely on
users of property entries on using PROPERTY_ENTRY_XXX() macros to
properly initialize entries instead of poking in the instances directly.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Because property_copy_string_array() stores the newly allocated pointer in the
destination property, we have an awkward code in property_entry_copy_data()
where we fetch the new pointer from dst.
Let's change property_copy_string_array() to return pointer and rely on the
common path in property_entry_copy_data() to store it in destination structure.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Let's mark PROPERTY_ENTRY_* macros that are internal with double leading
underscores so users are not tempted to use them.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Let's switch to using PROPERTY_ENTRY_U8_ARRAY_LEN() to initialize
property entries. Also, when dumping data, rely on local variables
instead of poking into the property entry structure directly.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Sometimes we want to initialize property entry array from a regular
pointer, when we can't determine length automatically via ARRAY_SIZE.
Let's introduce PROPERTY_ENTRY_XXX_ARRAY_LEN macros that take explicit
"len" argument.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Matteo Croce says:
====================
icmp: move duplicate code in helper functions
Remove some duplicate code by moving it in two helper functions.
First patch adds the helpers, the second one uses it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The same code which recognizes ICMP error packets is duplicated several
times. Use the icmp_is_err() and icmpv6_is_err() helpers instead, which
do the same thing.
ip_multipath_l3_keys() and tcf_nat_act() didn't check for all the error types,
assume that they should instead.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add two helper functions, one for IPv4 and one for IPv6, to recognize
the ICMP packets which are error responses.
This packets are special because they have as payload the original
header of the packet which generated it (RFC 792 says at least 8 bytes,
but Linux actually includes much more than that).
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger says:
====================
netvsc: RSS related patches
Address a couple of issues related to recording RSS hash
value in skb. These were found by reviewing RSS support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since RSS hash is available from the host, record it in
the skb.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the driver needs to create a hash value because it
was not done at higher level, then the hash should be marked
as a software not hardware hash.
Fixes: f72860afa2 ("hv_netvsc: Exclude non-TCP port numbers from vRSS hashing")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't swap oper and admin schedules too early, it's not correct and
causes crash.
Steps to reproduce:
1)
tc qdisc replace dev eth0 parent root handle 100 taprio \
num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 1@2 \
base-time $SOME_BASE_TIME \
sched-entry S 01 80000 \
sched-entry S 02 15000 \
sched-entry S 04 40000 \
flags 2
2)
tc qdisc replace dev eth0 parent root handle 100 taprio \
base-time $SOME_BASE_TIME \
sched-entry S 01 90000 \
sched-entry S 02 20000 \
sched-entry S 04 40000 \
flags 2
3)
tc qdisc replace dev eth0 parent root handle 100 taprio \
base-time $SOME_BASE_TIME \
sched-entry S 01 150000 \
sched-entry S 02 200000 \
sched-entry S 04 40000 \
flags 2
Do 2 3 2 .. steps more times if not happens and observe:
[ 305.832319] Unable to handle kernel write to read-only memory at
virtual address ffff0000087ce7f0
[ 305.910887] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
[ 305.919306] Hardware name: Texas Instruments AM654 Base Board (DT)
[...]
[ 306.017119] x1 : ffff800848031d88 x0 : ffff800848031d80
[ 306.022422] Call trace:
[ 306.024866] taprio_free_sched_cb+0x4c/0x98
[ 306.029040] rcu_process_callbacks+0x25c/0x410
[ 306.033476] __do_softirq+0x10c/0x208
[ 306.037132] irq_exit+0xb8/0xc8
[ 306.040267] __handle_domain_irq+0x64/0xb8
[ 306.044352] gic_handle_irq+0x7c/0x178
[ 306.048092] el1_irq+0xb0/0x128
[ 306.051227] arch_cpu_idle+0x10/0x18
[ 306.054795] do_idle+0x120/0x138
[ 306.058015] cpu_startup_entry+0x20/0x28
[ 306.061931] rest_init+0xcc/0xd8
[ 306.065154] start_kernel+0x3bc/0x3e4
[ 306.068810] Code: f2fbd5b7 f2fbd5b6 d503201f f9400422 (f9000662)
[ 306.074900] ---[ end trace 96c8e2284a9d9d6e ]---
[ 306.079507] Kernel panic - not syncing: Fatal exception in interrupt
[ 306.085847] SMP: stopping secondary CPUs
[ 306.089765] Kernel Offset: disabled
Try to explain one of the possible crash cases:
The "real" admin list is assigned when admin_sched is set to
new_admin, it happens after "swap", that assigns to oper_sched NULL.
Thus if call qdisc show it can crash.
Farther, next second time, when sched list is updated, the admin_sched
is not NULL and becomes the oper_sched, previous oper_sched was NULL so
just skipped. But then admin_sched is assigned new_admin, but schedules
to free previous assigned admin_sched (that already became oper_sched).
Farther, next third time, when sched list is updated,
while one more swap, oper_sched is not null, but it was happy to be
freed already (while prev. admin update), so while try to free
oper_sched the kernel panic happens at taprio_free_sched_cb().
So, move the "swap emulation" where it should be according to function
comment from code.
Fixes: 9c66d15646 ("taprio: Add support for hardware offloading")
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2019-11-04
This series contains updates to the ice driver only.
Anirudh refactors the code to reduce the kernel configuration flags and
introduces ice_base.c file.
Maciej does additional refactoring on the configuring of transmit
rings so that we are not configuring per each traffic class flow.
Added support for XDP in the ice driver. Provides additional
re-organizing of the code in preparation for adding build_skb() support
in the driver. Adjusted the computational padding logic for headroom
and tailroom to better support build_skb(), which also aligns with the
logic in other Intel LAN drivers. Added build_skb support and make use
of the XDP's data_meta.
Krzysztof refactors the driver to prepare for AF_XDP support in the
driver and then adds support for AF_XDP.
v2: Updated patch 3 of the series based on community feedback with the
following changes...
- return -EOPNOTSUPP instead of ENOTSUPP for too large MTU which makes
it impossible to attach XDP prog
- don't check for case when there's no XDP prog currently on interface
and ice_xdp() is called with NULL bpf_prog; this happens when user
does "ip link set eth0 xdp off" and no prog is present on VSI; no need
for that as it is handled by higher layer
- drop the extack message for unknown xdp->command
- use the smp_processor_id() for accessing the XDP Tx ring for XDP_TX
action
- don't leave the interface in downed state in case of any failure
during the XDP Tx resources handling
- undo rename of ice_build_ctob
The above changes caused a ripple effect in patches 4 & 5 to update
references to ice_build_ctob() which are now build_ctob()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2019-11-04
This series contains old Halloween candy updates, yet still sweet, to
fm10k, ixgbe and i40e.
Jake adds the missing initializers for a couple of the TLV attribute
macros. Added support for capturing and reporting statistics for all of
the VFs in a given PF. Lastly, bump the version of the fm10k driver to
reflect the recent changes.
Alex addresses locality issues in the ixgbe driver when it is loaded on
a system supporting multiple NUMA nodes.
Manjunath Patil provides changes to the ixgbe driver, similar to those
made to igb, to prevent transmit packets to request a hardware timestamp
when the NIC has not been setup via the SIOCSHWTSTAMP ioctl.
Alice adds support for x710 by adding the missing device id's in the
appropriate places to ensure all the features are enabled in i40e.
Jesse adds support for VF stats gathering in the i40e via the kernel
via ndo_get_vf_stats function.
v2: Fixed up commit id references in patch 5's description to align with
how commit id's should be referenced.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Kleine-Budde says:
====================
pull-request: can 2019-11-05
this is a pull request of 33 patches for net/master.
In the first patch Wen Yang's patch adds a missing of_node_put() to CAN device
infrastructure.
Navid Emamdoost's patch for the gs_usb driver fixes a memory leak in the
gs_can_open() error path.
Johan Hovold provides two patches, one for the mcba_usb, the other for the
usb_8dev driver. Both fix a use-after-free after USB-disconnect.
Joakim Zhang's patch improves the flexcan driver, the ECC mechanism is now
completely disabled instead of masking the interrupts.
The next three patches all target the peak_usb driver. Stephane Grosjean's
patch fixes a potential out-of-sync while decoding packets, Johan Hovold's
patch fixes a slab info leak, Jeroen Hofstee's patch adds missing reporting of
bus off recovery events.
Followed by three patches for the c_can driver. Kurt Van Dijck's patch fixes
detection of potential missing status IRQs, Jeroen Hofstee's patches add a chip
reset on open and add missing reporting of bus off recovery events.
Appana Durga Kedareswara rao's patch for the xilinx driver fixes the flags
field initialization for axi CAN.
The next seven patches target the rx-offload helper, they are by me and Jeroen
Hofstee. The error handling in case of a queue overflow is fixed removing a
memory leak. Further the error handling in case of queue overflow and skb OOM
is cleaned up.
The next two patches are by me and target the flexcan and ti_hecc driver. In
case of a error during can_rx_offload_queue_sorted() the error counters in the
drivers are incremented.
Jeroen Hofstee provides 6 patches for the ti_hecc driver, which properly stop
the device in ifdown, improve the rx-offload support (which hit mainline in
v5.4-rc1), and add missing FIFO overflow and state change reporting.
The following four patches target the j1939 protocol. Colin Ian King's patch
fixes a memory leak in the j1939_sk_errqueue() handling. Three patches by
Oleksij Rempel fix a memory leak on socket release and fix the EOMA packet in
the transport protocol.
Timo Schlüßler's patch fixes a potential race condition in the mcp251x driver
on after suspend.
The last patch is by Yegor Yefremov and updates the SPDX-License-Identifier to
v3.0.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing nvme_passthru_cmd64 to add a field: rsvd2. This field is an explicit
marker for the padding space added on certain platforms as a result of the
enlargement of the result field from 32 bit to 64 bits in size, and
fixes differences in struct size when using compat ioctl for 32-bit
binaries on 64-bit architecture.
Fixes: 65e68edce0 ("nvme: allow 64-bit results in passthru commands")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Charles Machalow <csm10495@gmail.com>
[changelog]
Signed-off-by: Keith Busch <kbusch@kernel.org>
Currently we reserve j_max_transaction_buffers / 32 for transaction
descriptor blocks. Now that revoke descriptors are accounted for
separately this estimate is unnecessarily high and we can actually
compute much tighter estimate. In the common case of 32k journal blocks
and 4k blocksize this actually reduces the amount of reserved descriptor
blocks from 256 to ~25 which allows us to fit more real data into a
transaction.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-25-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
So far we have reserved only relatively high fixed amount of revoke
credits for each transaction. We over-reserved by large amount for most
cases but when freeing large directories or files with data journalling,
the fixed amount is not enough. In fact the worst case estimate is
inconveniently large (maximum extent size) for freeing of one extent.
We fix this by doing proper estimate of the amount of blocks that need
to be revoked when removing blocks from the inode due to truncate or
hole punching and otherwise reserve just a small amount of revoke
credits for each transaction to accommodate freeing of xattrs block or
so.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-23-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Make checking of available credits in jbd2_journal_dirty_metadata() more
strict. There should be always enough credits in the handle to write all
potential revoke descriptors. Also we warn in case there are not enough
credits since this is a bug in the filesystem.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-22-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Extend functions for starting, extending, and restarting transaction
handles to take number of revoke records handle must be able to
accommodate. These functions then make sure transaction has enough
credits to be able to store resulting revoke descriptor blocks. Also
revoke code tracks number of revoke records created by a handle to catch
situation where some place didn't reserve enough space for revoke
records. Similarly to standard transaction credits, space for unused
reserved revoke records is released when the handle is stopped.
On the ext4 side we currently take a simplistic approach of reserving
space for 1024 revoke records for any transaction. This grows amount of
credits reserved for each handle only by a few and is enough for any
normal workload so that we don't hit warnings in jbd2. We will refine
the logic in following commits.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-20-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Currently, journal descriptor blocks were not accounted in
transaction->t_outstanding_credits and we were just leaving some slack
space in the journal for them (in jbd2_log_space_left() and
jbd2_space_needed()). This is making proper accounting (and reservation
we want to add) of descriptor blocks difficult so switch to accounting
descriptor blocks in transaction->t_outstanding_credits and just reserve
the same amount of credits in t_outstanding credits for journal
descriptor blocks when creating transaction.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-18-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
jbd2__journal_restart() has quite some code that is common with
jbd2_journal_stop(). Factor this functionality into stop_this_handle()
helper and use it from both functions. Note that this also drops
t_handle_lock protection from jbd2__journal_restart() as
jbd2_journal_stop() does the same thing without it.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-17-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we drop last handle from a transaction and journal->j_barrier_count
> 0, jbd2_journal_stop() wakes up journal->j_wait_transaction_locked
wait queue. This looks pointless - wait for outstanding handles always
happens on journal->j_wait_updates waitqueue.
journal->j_wait_transaction_locked is used to wait for transaction state
changes and by start_this_handle() for waiting until
journal->j_barrier_count drops to 0. The first case is clearly
irrelevant here since only jbd2 thread changes transaction state. The
second case looks related but jbd2_journal_unlock_updates() is
responsible for the wakeup in this case. So just drop the wakeup.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-16-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If a transaction is larger than journal->j_max_transaction_buffers, that
is a bug and not a trigger for transaction commit. Also the very next
attempt to start new handle will start transaction commit anyway. So
just remove the pointless check. Arguably, we could start transaction
commit whenever the transaction size is *close* to
journal->j_max_transaction_buffers. This has a potential to reduce
latency of the next jbd2_journal_start() at the cost of somewhat smaller
transactions. However for this to have any effect, it would mean that
there isn't someone already waiting in jbd2_journal_start() which means
metadata load for the fs is pretty light anyway so probably this
optimization is not worth it.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-15-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>