commit 36415a7964711822e63695ea67fede63979054d9 upstream.
The brcmnand driver contains a bug in which if a page (example 2k byte)
is read from the parallel/ONFI NAND and within that page a subpage (512
byte) has correctable errors which is followed by a subpage with
uncorrectable errors, the page read will return the wrong status of
correctable (as opposed to the actual status of uncorrectable.)
The bug is in function brcmnand_read_by_pio where there is a check for
uncorrectable bits which will be preempted if a previous status for
correctable bits is detected.
The fix is to stop checking for bad bits only if we already have a bad
bits status.
Fixes: 27c5b17cd1 ("mtd: nand: add NAND driver "library" for Broadcom STB NAND controller")
Signed-off-by: david regan <dregan@mail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/trinity-478e0c09-9134-40e8-8f8c-31c371225eda-1643237024774@3c-app-mailcom-lxa02
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c23b3f965bc9ee696bf2ed4bdc54d339dd9a455 upstream.
Interacting with a NAND chip on an IPQ6018 I found that the qcomsmem NAND
partition parser was returning -EPROBE_DEFER waiting for the main smem
driver to load.
This caused the board to reset. Playing about with the probe() function
shows that the problem lies in the core clock being switched off before the
nandc_unalloc() routine has completed.
If we look at how qcom_nandc_remove() tears down allocated resources we see
the expected order is
qcom_nandc_unalloc(nandc);
clk_disable_unprepare(nandc->aon_clk);
clk_disable_unprepare(nandc->core_clk);
dma_unmap_resource(&pdev->dev, nandc->base_dma, resource_size(res),
DMA_BIDIRECTIONAL, 0);
Tweaking probe() to both bring up and tear-down in that order removes the
reset if we end up deferring elsewhere.
Fixes: c76b78d8ec ("mtd: nand: Qualcomm NAND controller driver")
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20220103030316.58301-2-bryan.odonoghue@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3593030761630e09200072a4bd06468892c27be3 upstream.
Daniel Gibson reports that the n_tty code gets line termination wrong in
very specific cases:
"If you feed a line with exactly 64 chars + terminating newline, and
directly afterwards (without reading) another line into a pseudo
terminal, the the first read() on the other side will return the 64
char line *without* terminating newline, and the next read() will
return the missing terminating newline AND the complete next line (if
it fits in the buffer)"
and bisected the behavior to commit 3b830a9c34d5 ("tty: convert
tty_ldisc_ops 'read()' function to take a kernel pointer").
Now, digging deeper, it turns out that the behavior isn't exactly new:
what changed in commit 3b830a9c34d5 was that the tty line discipline
.read() function is now passed an intermediate kernel buffer rather than
the final user space buffer.
And that intermediate kernel buffer is 64 bytes in size - thus that
special case with exactly 64 bytes plus terminating newline.
The same problem did exist before, but historically the boundary was not
the 64-byte chunk, but the user-supplied buffer size, which is obviously
generally bigger (and potentially bigger than N_TTY_BUF_SIZE, which
would hide the issue entirely).
The reason is that the n_tty canon_copy_from_read_buf() code would look
ahead for the EOL character one byte further than it would actually
copy. It would then decide that it had found the terminator, and unmark
it as an EOL character - which in turn explains why the next read
wouldn't then be terminated by it.
Now, the reason it did all this in the first place is related to some
historical and pretty obscure EOF behavior, see commit ac8f3bf883
("n_tty: Fix poll() after buffer-limited eof push read") and commit
40d5e0905a ("n_tty: Fix EOF push handling").
And the reason for the EOL confusion is that we treat EOF as a special
EOL condition, with the EOL character being NUL (aka "__DISABLED_CHAR"
in the kernel sources).
So that EOF look-ahead also affects the normal EOL handling.
This patch just removes the look-ahead that causes problems, because EOL
is much more critical than the historical "EOF in the middle of a line
that coincides with the end of the buffer" handling ever was.
Now, it is possible that we should indeed re-introduce the "look at next
character to see if it's a EOF" behavior, but if so, that should be done
not at the kernel buffer chunk boundary in canon_copy_from_read_buf(),
but at a higher level, when we run out of the user buffer.
In particular, the place to do that would be at the top of
'n_tty_read()', where we check if it's a continuation of a previously
started read, and there is no more buffer space left, we could decide to
just eat the __DISABLED_CHAR at that point.
But that would be a separate patch, because I suspect nobody actually
cares, and I'd like to get a report about it before bothering.
Fixes: 3b830a9c34d5 ("tty: convert tty_ldisc_ops 'read()' function to take a kernel pointer")
Fixes: ac8f3bf883 ("n_tty: Fix poll() after buffer-limited eof push read")
Fixes: 40d5e0905a ("n_tty: Fix EOF push handling")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215611
Reported-and-tested-by: Daniel Gibson <metalcaedes@gmail.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d19e0183a88306acda07f4a01fedeeffe2a2a06b upstream.
The result of the writeback, whether it is an ENOSPC or an EIO, or
anything else, does not inhibit the NFS client from reporting the
correct file timestamps.
Fixes: 79566ef018 ("NFS: Getattr doesn't require data sync semantics")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e0caaf75d443e02e55e146fd75fe2efc8aed5540 upstream.
Commit ac795161c936 (NFSv4: Handle case where the lookup of a directory
fails) [1], part of Linux since 5.17-rc2, introduced a regression, where
a symbolic link on an NFS mount to a directory on another NFS does not
resolve(?) the first time it is accessed:
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Fixes: ac795161c936 ("NFSv4: Handle case where the lookup of a directory fails")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Donald Buczek <buczek@molgen.mpg.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e92bc4cd34de2ce454bdea8cd198b8067ee4e123 upstream.
Now that we disable wbt by set WBT_STATE_OFF_DEFAULT in
wbt_disable_default() when switch elevator to bfq. And when
we remove scsi device, wbt will be enabled by wbt_enable_default.
If it become false positive between wbt_wait() and wbt_track()
when submit write request.
The following is the scenario that triggered the problem.
T1 T2 T3
elevator_switch_mq
bfq_init_queue
wbt_disable_default <= Set
rwb->enable_state (OFF)
Submit_bio
blk_mq_make_request
rq_qos_throttle
<= rwb->enable_state (OFF)
scsi_remove_device
sd_remove
del_gendisk
blk_unregister_queue
elv_unregister_queue
wbt_enable_default
<= Set rwb->enable_state (ON)
q_qos_track
<= rwb->enable_state (ON)
^^^^^^ this request will mark WBT_TRACKED without inflight add and will
lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung.
Fix this by move wbt_enable_default() from elv_unregister to
bfq_exit_queue(). Only re-enable wbt when bfq exit.
Fixes: 76a8040817b4b ("blk-wbt: make sure throttle is enabled properly")
Remove oneline stale comment, and kill one oneshot local variable.
Signed-off-by: Ming Lei <ming.lei@rehdat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/linux-block/20211214133103.551813-1-qiulaibin@huawei.com/
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55467fcd55b89c622e62b4afe60ac0eb2fae91f2 upstream.
Always signal that emulation is possible for !SEV guests regardless of
whether or not the CPU provided a valid instruction byte stream. KVM can
read all guest state (memory and registers) for !SEV guests, i.e. can
fetch the code stream from memory even if the CPU failed to do so because
of the SMAP errata.
Fixes: 05d5a48635 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)")
Cc: stable@vger.kernel.org
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <20220120010719.711476-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[jwang: adjust context for kernel 5.10.101]
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe663df7825811358531dc2e8a52d9eaa5e3515e upstream.
Building tinyconfig with gcc (Debian 11.2.0-16) and assembler (Debian
2.37.90.20220207) the following build error shows up:
{standard input}: Assembler messages:
{standard input}:2088: Error: unrecognized opcode: `ptesync'
make[3]: *** [/builds/linux/scripts/Makefile.build:287: arch/powerpc/lib/sstep.o] Error 1
Add the 'ifdef CONFIG_PPC64' around the 'ptesync' in function
'emulate_update_regs()' to like it is in 'analyse_instr()'. Since it looks like
it got dropped inadvertently by commit 3cdfcbfd32 ("powerpc: Change
analyse_instr so it doesn't modify *regs").
A key detail is that analyse_instr() will never recognise lwsync or
ptesync on 32-bit (because of the existing ifdef), and as a result
emulate_update_regs() should never be called with an op specifying
either of those on 32-bit. So removing them from emulate_update_regs()
should be a nop in terms of runtime behaviour.
Fixes: 3cdfcbfd32 ("powerpc: Change analyse_instr so it doesn't modify *regs")
Cc: stable@vger.kernel.org # v4.14+
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
[mpe: Add last paragraph of change log mentioning analyse_instr() details]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220211005113.1361436-1-anders.roxell@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2a845837e3d0ddaed493b4c5c4643d7f0542804d upstream.
The recently introduced coef_mutex for Realtek codec seems causing a
deadlock when the relevant code is invoked from the power-off state;
then the HD-audio core tries to power-up internally, and this kicks
off the codec runtime PM code that tries to take the same coef_mutex.
In order to avoid the deadlock, do the temporary power up/down around
the coef_mutex acquisition and release. This assures that the
power-up sequence runs before the mutex, hence no re-entrance will
happen.
Fixes: b837a9f5ab3b ("ALSA: hda: realtek: Fix race at concurrent COEF updates")
Reported-and-tested-by: Julian Wollrath <jwollrath@web.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220214132838.4db10fca@schienar
Link: https://lore.kernel.org/r/20220214130410.21230-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a7e793a867ae312cecdeb6f06cceff98263e75dd upstream.
non-regular file needs to be compiled and then copied to the output
directory. Remove it from TEST_PROGS and add it to TEST_GEN_PROGS. This
removes error thrown by rsync when non-regular object isn't found:
rsync: [sender] link_stat "/linux/tools/testing/selftests/exec/non-regular" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
Fixes: 0f71241a8e ("selftests/exec: add file type errno tests")
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31ded1535e3182778a1d0e5c32711f55da3bc512 upstream.
This was detected by the gcc in Fedora Rawhide's gcc:
50 11.01 fedora:rawhide : FAIL gcc version 12.0.1 20220205 (Red Hat 12.0.1-0) (GCC)
inlined from 'bpf__config_obj' at util/bpf-loader.c:1242:9:
util/bpf-loader.c:1225:34: error: pointer 'map_opt' may be used after 'free' [-Werror=use-after-free]
1225 | *key_scan_pos += strlen(map_opt);
| ^~~~~~~~~~~~~~~
util/bpf-loader.c:1223:9: note: call to 'free' here
1223 | free(map_name);
| ^~~~~~~~~~~~~~
cc1: all warnings being treated as errors
So do the calculations on the pointer before freeing it.
Fixes: 04f9bf2bac ("perf bpf-loader: Add missing '*' for key_scan_pos")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang ShaoBo <bobo.shaobowang@huawei.com>
Link: https://lore.kernel.org/lkml/Yg1VtQxKrPpS3uNA@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07dd44852be89386ab12210df90a2d78779f3bff upstream.
1588 Single Step Timestamping code path uses a mutex to
enforce atomicity for two events:
- update of ptp single step register
- transmit ptp event packet
Before this patch the mutex was not initialized. This
caused unexpected crashes in the Tx function.
Fixes: c55211892f ("dpaa2-eth: support PTP Sync packet one-step timestamping")
Signed-off-by: Radu Bulie <radu-andrei.bulie@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 52a9dab6d892763b2a8334a568bd4e2c1a6fde66 upstream.
GCC 12 correctly reports a potential use-after-free condition in the
xrealloc helper. Fix the warning by avoiding an implicit "free(ptr)"
when size == 0:
In file included from help.c:12:
In function 'xrealloc',
inlined from 'add_cmdname' at help.c:24:2: subcmd-util.h:56:23: error: pointer may be used after 'realloc' [-Werror=use-after-free]
56 | ret = realloc(ptr, size);
| ^~~~~~~~~~~~~~~~~~
subcmd-util.h:52:21: note: call to 'realloc' here
52 | void *ret = realloc(ptr, size);
| ^~~~~~~~~~~~~~~~~~
subcmd-util.h:58:31: error: pointer may be used after 'realloc' [-Werror=use-after-free]
58 | ret = realloc(ptr, 1);
| ^~~~~~~~~~~~~~~
subcmd-util.h:52:21: note: call to 'realloc' here
52 | void *ret = realloc(ptr, size);
| ^~~~~~~~~~~~~~~~~~
Fixes: 2f4ce5ec1d ("perf tools: Finalize subcmd independence")
Reported-by: Valdis Klētnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Kees Kook <keescook@chromium.org>
Tested-by: Valdis Klētnieks <valdis.kletnieks@vt.edu>
Tested-by: Justin M. Forbes <jforbes@fedoraproject.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: linux-hardening@vger.kernel.org
Cc: Valdis Klētnieks <valdis.kletnieks@vt.edu>
Link: http://lore.kernel.org/lkml/20220213182443.4037039-1-keescook@chromium.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dcd54265c8bc14bd023815e36e2d5f9d66ee1fee upstream.
trace_napi_poll_hit() is reading stat->dev while another thread can write
on it from dropmon_net_event()
Use READ_ONCE()/WRITE_ONCE() here, RCU rules are properly enforced already,
we only have to take care of load/store tearing.
BUG: KCSAN: data-race in dropmon_net_event / trace_napi_poll_hit
write to 0xffff88816f3ab9c0 of 8 bytes by task 20260 on cpu 1:
dropmon_net_event+0xb8/0x2b0 net/core/drop_monitor.c:1579
notifier_call_chain kernel/notifier.c:84 [inline]
raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:392
call_netdevice_notifiers_info net/core/dev.c:1919 [inline]
call_netdevice_notifiers_extack net/core/dev.c:1931 [inline]
call_netdevice_notifiers net/core/dev.c:1945 [inline]
unregister_netdevice_many+0x867/0xfb0 net/core/dev.c:10415
ip_tunnel_delete_nets+0x24a/0x280 net/ipv4/ip_tunnel.c:1123
vti_exit_batch_net+0x2a/0x30 net/ipv4/ip_vti.c:515
ops_exit_list net/core/net_namespace.c:173 [inline]
cleanup_net+0x4dc/0x8d0 net/core/net_namespace.c:597
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
worker_thread+0x616/0xa70 kernel/workqueue.c:2454
kthread+0x1bf/0x1e0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30
read to 0xffff88816f3ab9c0 of 8 bytes by interrupt on cpu 0:
trace_napi_poll_hit+0x89/0x1c0 net/core/drop_monitor.c:292
trace_napi_poll include/trace/events/napi.h:14 [inline]
__napi_poll+0x36b/0x3f0 net/core/dev.c:6366
napi_poll net/core/dev.c:6432 [inline]
net_rx_action+0x29e/0x650 net/core/dev.c:6519
__do_softirq+0x158/0x2de kernel/softirq.c:558
do_softirq+0xb1/0xf0 kernel/softirq.c:459
__local_bh_enable_ip+0x68/0x70 kernel/softirq.c:383
__raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline]
_raw_spin_unlock_bh+0x33/0x40 kernel/locking/spinlock.c:210
spin_unlock_bh include/linux/spinlock.h:394 [inline]
ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline]
wg_packet_decrypt_worker+0x73c/0x780 drivers/net/wireguard/receive.c:506
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
worker_thread+0x616/0xa70 kernel/workqueue.c:2454
kthread+0x1bf/0x1e0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30
value changed: 0xffff88815883e000 -> 0x0000000000000000
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 26435 Comm: kworker/0:1 Not tainted 5.17.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: wg-crypt-wg2 wg_packet_decrypt_worker
Fixes: 4ea7e38696 ("dropmon: add ability to detect when hardware dropsrxpackets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a6ab75cec1e461f8a35559054c146c21428430b8 upstream.
In __bond_release_one(), bond_set_carrier() is only called when bond
device has no slave. Therefore, if we remove the up slave from a master
with two slaves and keep the down slave, the master will remain up.
Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond))
statement.
Reproducer:
$ insmod bonding.ko mode=0 miimon=100 max_bonds=2
$ ifconfig bond0 up
$ ifenslave bond0 eth0 eth1
$ ifconfig eth0 down
$ ifenslave -d bond0 eth1
$ cat /proc/net/bonding/bond0
Fixes: ff59c4563a ("[PATCH] bonding: support carrier state for master")
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/1645021088-38370-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 35a79e64de29e8d57a5989aac57611c0cd29e13e upstream.
When 'ping' changes to use PING socket instead of RAW socket by:
# sysctl -w net.ipv4.ping_group_range="0 100"
There is another regression caused when matching sk_bound_dev_if
and dif, RAW socket is using inet_iif() while PING socket lookup
is using skb->dev->ifindex, the cmd below fails due to this:
# ip link add dummy0 type dummy
# ip link set dummy0 up
# ip addr add 192.168.111.1/24 dev dummy0
# ping -I dummy0 192.168.111.1 -c1
The issue was also reported on:
https://github.com/iputils/iputils/issues/104
But fixed in iputils in a wrong way by not binding to device when
destination IP is on device, and it will cause some of kselftests
to fail, as Jianlin noticed.
This patch is to use inet(6)_iif and inet(6)_sdif to get dif and
sdif for PING socket, and keep consistent with RAW socket.
Fixes: c319b4d76b ("net: ipv4: add IPPROTO_ICMP socket kind")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6bb9681a43f34f2cab4aad6e2a02da4ce54d13c5 upstream.
The reset input to the LAN9303 chip is active low, and devicetree
gpio handles reflect this. Therefore, the gpio should be requested
with an initial state of high in order for the reset signal to be
asserted. Other uses of the gpio already use the correct polarity.
Fixes: a1292595e0 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Mans Rullgard <mans@mansr.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fianelil <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220209145454.19749-1-mans@mansr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b0dff5b3b98c5c7ce848151df9da0b3cdf0cc8b upstream.
Ipv6 flowlabels historically require a reservation before use.
Optionally in exclusive mode (e.g., user-private).
Commit 59c820b231 ("ipv6: elide flowlabel check if no exclusive
leases exist") introduced a fastpath that avoids this check when no
exclusive leases exist in the system, and thus any flowlabel use
will be granted.
That allows skipping the control operation to reserve a flowlabel
entirely. Though with a warning if the fast path fails:
This is an optimization. Robust applications still have to revert to
requesting leases if the fast path fails due to an exclusive lease.
Still, this is subtle. Better isolate network namespaces from each
other. Flowlabels are per-netns. Also record per-netns whether
exclusive leases are in use. Then behavior does not change based on
activity in other netns.
Changes
v2
- wrap in IS_ENABLED(CONFIG_IPV6) to avoid breakage if disabled
Fixes: 59c820b231 ("ipv6: elide flowlabel check if no exclusive leases exist")
Link: https://lore.kernel.org/netdev/MWHPR2201MB1072BCCCFCE779E4094837ACD0329@MWHPR2201MB1072.namprd22.prod.outlook.com/
Reported-by: Congyu Liu <liu3101@purdue.edu>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Tested-by: Congyu Liu <liu3101@purdue.edu>
Link: https://lore.kernel.org/r/20220215160037.1976072-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2e71ec1a725a794a16e3862791ed43fe5ba6a06b upstream.
When the nft_concat_range test failed, it exit 1 in the code
specifically.
But when part of, or all of the test passed, it will failed the
[ ${passed} -eq 0 ] check and thus exit with 1, which is the same
exit value with failure result. Fix it by exit 0 when passed is not 0.
Fixes: 611973c1e0 ("selftests: netfilter: Introduce tests for sets with range concatenation")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9208492fcaecff8f43915529ae34b3bcb03877c upstream.
vsock_connect() expects that the socket could already be in the
TCP_ESTABLISHED state when the connecting task wakes up with a signal
pending. If this happens the socket will be in the connected table, and
it is not removed when the socket state is reset. In this situation it's
common for the process to retry connect(), and if the connection is
successful the socket will be added to the connected table a second
time, corrupting the list.
Prevent this by calling vsock_remove_connected() if a signal is received
while waiting for a connection. This is harmless if the socket is not in
the connected table, and if it is in the table then removing it will
prevent list corruption from a double add.
Note for backporting: this patch requires d5afa82c97 ("vsock: correct
removal of socket from the list"), which is in all current stable trees
except 4.9.y.
Fixes: d021c34405 ("VSOCK: Introduce VM Sockets")
Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20220217141312.2297547-1-sforshee@digitalocean.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a3193cdd5e5b96ac65f04ee42555c216da332af upstream.
Merge module sections only when using Clang LTO. With ld.bfd, merging
sections does not appear to update the symbol tables for the module,
e.g. 'readelf -s' shows the value that a symbol would have had, if
sections were not merged. ld.lld does not show this problem.
The stale symbol table breaks gdb's function disassembler, and presumably
other things, e.g.
gdb -batch -ex "file arch/x86/kvm/kvm.ko" -ex "disassemble kvm_init"
reads the wrong bytes and dumps garbage.
Fixes: dd2776222abb ("kbuild: lto: merge module sections")
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210322234438.502582-1-seanjc@google.com
Cc: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 042e293e16e3aa9794ce60c29f5b7b0c8170f933 ]
When account() is called, and the amount of entropy dips below
random_write_wakeup_bits, we wake up the random writers, so that they
can write some more in. However, the RNDZAPENTCNT/RNDCLEARPOOL ioctl
sets the entropy count to zero -- a potential reduction just like
account() -- but does not unblock writers. This commit adds the missing
logic to that ioctl to unblock waiting writers.
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dcb85f85fa6f142aae1fe86f399d4503d49f2b60 ]
While the stackleak plugin was already using notrace, objtool is now a
bit more picky. Update the notrace uses to noinstr. Silences the
following objtool warnings when building with:
CONFIG_DEBUG_ENTRY=y
CONFIG_STACK_VALIDATION=y
CONFIG_VMLINUX_VALIDATION=y
CONFIG_GCC_PLUGIN_STACKLEAK=y
vmlinux.o: warning: objtool: do_syscall_64()+0x9: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: do_int80_syscall_32()+0x9: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: exc_general_protection()+0x22: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: fixup_bad_iret()+0x20: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: do_machine_check()+0x27: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: .text+0x5346e: call to stackleak_erase() leaves .noinstr.text section
vmlinux.o: warning: objtool: .entry.text+0x143: call to stackleak_erase() leaves .noinstr.text section
vmlinux.o: warning: objtool: .entry.text+0x10eb: call to stackleak_erase() leaves .noinstr.text section
vmlinux.o: warning: objtool: .entry.text+0x17f9: call to stackleak_erase() leaves .noinstr.text section
Note that the plugin's addition of calls to stackleak_track_stack() from
noinstr functions is expected to be safe, as it isn't runtime
instrumentation and is self-contained.
Cc: Alexander Popov <alex.popov@linux.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 67d6212afda218d564890d1674bab28e8612170f ]
This reverts commit 774a1221e8.
We need to finish all async code before the module init sequence is
done. In the reverted commit the PF_USED_ASYNC flag was added to mark a
thread that called async_schedule(). Then the PF_USED_ASYNC flag was
used to determine whether or not async_synchronize_full() needs to be
invoked. This works when modprobe thread is calling async_schedule(),
but it does not work if module dispatches init code to a worker thread
which then calls async_schedule().
For example, PCI driver probing is invoked from a worker thread based on
a node where device is attached:
if (cpu < nr_cpu_ids)
error = work_on_cpu(cpu, local_pci_probe, &ddi);
else
error = local_pci_probe(&ddi);
We end up in a situation where a worker thread gets the PF_USED_ASYNC
flag set instead of the modprobe thread. As a result,
async_synchronize_full() is not invoked and modprobe completes without
waiting for the async code to finish.
The issue was discovered while loading the pm80xx driver:
(scsi_mod.scan=async)
modprobe pm80xx worker
...
do_init_module()
...
pci_call_probe()
work_on_cpu(local_pci_probe)
local_pci_probe()
pm8001_pci_probe()
scsi_scan_host()
async_schedule()
worker->flags |= PF_USED_ASYNC;
...
< return from worker >
...
if (current->flags & PF_USED_ASYNC) <--- false
async_synchronize_full();
Commit 21c3c5d280 ("block: don't request module during elevator init")
fixed the deadlock issue which the reverted commit 774a1221e8
("module, async: async_synchronize_full() on module init iff async is
used") tried to fix.
Since commit 0fdff3ec6d ("async, kmod: warn on synchronous
request_module() from async workers") synchronous module loading from
async is not allowed.
Given that the original deadlock issue is fixed and it is no longer
allowed to call synchronous request_module() from async we can remove
PF_USED_ASYNC flag to make module init consistently invoke
async_synchronize_full() unless async module probe is requested.
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Reviewed-by: Changyuan Lyu <changyuanl@google.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e25a8d959992f61b64a58fc62fb7951dc6f31d1f ]
This started out with me noticing that "dom0_max_vcpus=<N>" with <N>
larger than the number of physical CPUs reported through ACPI tables
would not bring up the "excess" vCPU-s. Addressing this is the primary
purpose of the change; CPU maps handling is being tidied only as far as
is necessary for the change here (with the effect of also avoiding the
setting up of too much per-CPU infrastructure, i.e. for CPUs which can
never come online).
Noticing that xen_fill_possible_map() is called way too early, whereas
xen_filter_cpu_maps() is called too late (after per-CPU areas were
already set up), and further observing that each of the functions serves
only one of Dom0 or DomU, it looked like it was better to simplify this.
Use the .get_smp_config hook instead, uniformly for Dom0 and DomU.
xen_fill_possible_map() can be dropped altogether, while
xen_filter_cpu_maps() is re-purposed but not otherwise changed.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/2dbd5f0a-9859-ca2d-085e-a02f7166c610@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b6bb1722f34bbdbabed27acdceaf585d300c5fd2 ]
While nvme_rdma_submit_async_event_work is checking the ctrl and queue
state before preparing the AER command and scheduling io_work, in order
to fully prevent a race where this check is not reliable the error
recovery work must flush async_event_work before continuing to destroy
the admin queue after setting the ctrl state to RESETTING such that
there is no race .submit_async_event and the error recovery handler
itself changing the ctrl state.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ff9fc7ebf5c06de1ef72a69f9b1ab40af8b07f9e ]
While nvme_tcp_submit_async_event_work is checking the ctrl and queue
state before preparing the AER command and scheduling io_work, in order
to fully prevent a race where this check is not reliable the error
recovery work must flush async_event_work before continuing to destroy
the admin queue after setting the ctrl state to RESETTING such that
there is no race .submit_async_event and the error recovery handler
itself changing the ctrl state.
Tested-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0fa0f99fc84e41057cbdd2efbfe91c6b2f47dd9d ]
Unlike .queue_rq, in .submit_async_event drivers may not check the ctrl
readiness for AER submission. This may lead to a use-after-free
condition that was observed with nvme-tcp.
The race condition may happen in the following scenario:
1. driver executes its reset_ctrl_work
2. -> nvme_stop_ctrl - flushes ctrl async_event_work
3. ctrl sends AEN which is received by the host, which in turn
schedules AEN handling
4. teardown admin queue (which releases the queue socket)
5. AEN processed, submits another AER, calling the driver to submit
6. driver attempts to send the cmd
==> use-after-free
In order to fix that, add ctrl state check to validate the ctrl
is actually able to accept the AER submission.
This addresses the above race in controller resets because the driver
during teardown should:
1. change ctrl state to RESETTING
2. flush async_event_work (as well as other async work elements)
So after 1,2, any other AER command will find the
ctrl state to be RESETTING and bail out without submitting the AER.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit df7abcaa1246e2537ab4016077b5443bb3c09378 ]
Currently a use-after-free may occur if a sas_task is aborted by the upper
layer before we handle the I/O completion in mpi_ssp_completion() or
mpi_sata_completion().
In this case, the following are the two steps in handling those I/O
completions:
- Call complete() to inform the upper layer handler of completion of
the I/O.
- Release driver resources associated with the sas_task in
pm8001_ccb_task_free() call.
When complete() is called, the upper layer may free the sas_task. As such,
we should not touch the associated sas_task afterwards, but we do so in the
pm8001_ccb_task_free() call.
Fix by swapping the complete() and pm8001_ccb_task_free() calls ordering.
Link: https://lore.kernel.org/r/1643289172-165636-4-git-send-email-john.garry@huawei.com
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 61f162aa4381845acbdc7f2be4dfb694d027c018 ]
Currently a use-after-free may occur if a TMF sas_task is aborted before we
handle the IO completion in mpi_ssp_completion(). The abort occurs due to
timeout.
When the timeout occurs, the SAS_TASK_STATE_ABORTED flag is set and the
sas_task is freed in pm8001_exec_internal_tmf_task().
However, if the I/O completion occurs later, the I/O completion still
thinks that the sas_task is available. Fix this by clearing the ccb->task
if the TMF times out - the I/O completion handler does nothing if this
pointer is cleared.
Link: https://lore.kernel.org/r/1643289172-165636-3-git-send-email-john.garry@huawei.com
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>