Pull security subsystem fixes from James Morris:
"Two simple fixes for v5.8:
- Fix hook iteration and default value for inode_copy_up_xattr
(KP Singh)
- Fix the key_permission LSM hook function type (Sami Tolvanen)"
* tag 'fixes-v5.8-rc3-a' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
security: Fix hook iteration and default value for inode_copy_up_xattr
security: fix the key_permission LSM hook function type
Pull integrity updates from Mimi Zohar:
"Include PCRs 8 & 9 in per TPM 2.0 bank boot_aggregate calculation.
Prior to Linux 5.8 the SHA1 "boot_aggregate" value was padded with 0's
and extended into the other TPM 2.0 banks.
Included in the Linux 5.8 open window, TPM 2.0 PCR bank specific
"boot_aggregate" values (PCRs 0 - 7) are calculated and extended into the TPM banks.
Distro releases are now shipping grub2 with TPM support, which extend
PCRs 8 & 9. I'd like for PCRs 8 & 9 to be included in the new
"boot_aggregate" calculations.
For backwards compatibility, if the hash is SHA1, these new PCRs are
not included in the boot aggregate"
* tag 'integrity-v5.8-fix-2' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: extend boot_aggregate with kernel measurements
Instead, expose the key via the input framework, as SW_MACHINE_COVER
The chip-detect GPIO is actually detecting if the cover is closed.
Technically it's possible to use the SD card with open cover. The
only downside is risk of battery falling out and user being able
to physically remove the card.
The behaviour of SD card not being available when the device is
open is unexpected and creates more problems than it solves. There
is a high chance, that more people accidentally break their rootfs
by opening the case without physically removing the card.
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Merlijn Wajer <merlijn@wizzup.org>
Link: https://lore.kernel.org/r/20200612125402.18393-3-merlijn@wizzup.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Since 5.7, we've been using task_work to trigger async running of
requests in the context of the original task. This generally works
great, but there's a case where if the task is currently blocked
in the kernel waiting on a condition to become true, it won't process
task_work. Even though the task is woken, it just checks whatever
condition it's waiting on, and goes back to sleep if it's still false.
This is a problem if that very condition only becomes true when that
task_work is run. An example of that is the task registering an eventfd
with io_uring, and it's now blocked waiting on an eventfd read. That
read could depend on a completion event, and that completion event
won't get trigged until task_work has been run.
Use the TWA_SIGNAL notification for task_work, so that we ensure that
the task always runs the work when queued.
Cc: stable@vger.kernel.org # v5.7
Signed-off-by: Jens Axboe <axboe@kernel.dk>
So that the target task will exit the wait_event_interruptible-like
loop and call task_work_run() asap.
The patch turns "bool notify" into 0,TWA_RESUME,TWA_SIGNAL enum, the
new TWA_SIGNAL flag implies signal_wake_up(). However, it needs to
avoid the race with recalc_sigpending(), so the patch also adds the
new JOBCTL_TASK_WORK bit included in JOBCTL_PENDING_MASK.
TODO: once this patch is merged we need to change all current users
of task_work_add(notify = true) to use TWA_RESUME.
Cc: stable@vger.kernel.org # v5.7
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Iterating over BPF links attached to network namespace in pre_exit hook is
not safe, even if there is just one. Once link gets auto-detached, that is
its back-pointer to net object is set to NULL, the link can be released and
freed without waiting on netns_bpf_mutex, effectively causing the list
element we are operating on to be freed.
This leads to use-after-free when trying to access the next element on the
list, as reported by KASAN. Bug can be triggered by destroying a network
namespace, while also releasing a link attached to this network namespace.
| ==================================================================
| BUG: KASAN: use-after-free in netns_bpf_pernet_pre_exit+0xd9/0x130
| Read of size 8 at addr ffff888119e0d778 by task kworker/u8:2/177
|
| CPU: 3 PID: 177 Comm: kworker/u8:2 Not tainted 5.8.0-rc1-00197-ga0c04c9d1008-dirty #776
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
| Workqueue: netns cleanup_net
| Call Trace:
| dump_stack+0x9e/0xe0
| print_address_description.constprop.0+0x3a/0x60
| ? netns_bpf_pernet_pre_exit+0xd9/0x130
| kasan_report.cold+0x1f/0x40
| ? netns_bpf_pernet_pre_exit+0xd9/0x130
| netns_bpf_pernet_pre_exit+0xd9/0x130
| cleanup_net+0x30b/0x5b0
| ? unregister_pernet_device+0x50/0x50
| ? rcu_read_lock_bh_held+0xb0/0xb0
| ? _raw_spin_unlock_irq+0x24/0x50
| process_one_work+0x4d1/0xa10
| ? lock_release+0x3e0/0x3e0
| ? pwq_dec_nr_in_flight+0x110/0x110
| ? rwlock_bug.part.0+0x60/0x60
| worker_thread+0x7a/0x5c0
| ? process_one_work+0xa10/0xa10
| kthread+0x1e3/0x240
| ? kthread_create_on_node+0xd0/0xd0
| ret_from_fork+0x1f/0x30
|
| Allocated by task 280:
| save_stack+0x1b/0x40
| __kasan_kmalloc.constprop.0+0xc2/0xd0
| netns_bpf_link_create+0xfe/0x650
| __do_sys_bpf+0x153a/0x2a50
| do_syscall_64+0x59/0x300
| entry_SYSCALL_64_after_hwframe+0x44/0xa9
|
| Freed by task 198:
| save_stack+0x1b/0x40
| __kasan_slab_free+0x12f/0x180
| kfree+0xed/0x350
| process_one_work+0x4d1/0xa10
| worker_thread+0x7a/0x5c0
| kthread+0x1e3/0x240
| ret_from_fork+0x1f/0x30
|
| The buggy address belongs to the object at ffff888119e0d700
| which belongs to the cache kmalloc-192 of size 192
| The buggy address is located 120 bytes inside of
| 192-byte region [ffff888119e0d700, ffff888119e0d7c0)
| The buggy address belongs to the page:
| page:ffffea0004678340 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
| flags: 0x2fffe0000000200(slab)
| raw: 02fffe0000000200 ffffea00045ba8c0 0000000600000006 ffff88811a80ea80
| raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
| page dumped because: kasan: bad access detected
|
| Memory state around the buggy address:
| ffff888119e0d600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
| ffff888119e0d680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
| >ffff888119e0d700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
| ^
| ffff888119e0d780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
| ffff888119e0d800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
| ==================================================================
Remove the "fast-path" for releasing a link that got auto-detached by a
dying network namespace to fix it. This way as long as link is on the list
and netns_bpf mutex is held, we have a guarantee that link memory can be
accessed.
An alternative way to fix this issue would be to safely iterate over the
list of links and ensure there is no access to link object after detaching
it. But, at the moment, optimizing synchronization overhead on link release
without a workload in mind seems like an overkill.
Fixes: ab53cad90e ("bpf, netns: Keep a list of attached bpf_link's")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200630164541.1329993-1-jakub@cloudflare.com
Lorenz Bauer says:
====================
Both sockmap and flow_dissector ingnore various arguments passed to
BPF_PROG_ATTACH and BPF_PROG_DETACH. We can fix the attach case by
checking that the unused arguments are zero. I considered requiring
target_fd to be -1 instead of 0, but this leads to a lot of churn
in selftests. There is also precedent in that bpf_iter already
expects 0 for a similar field. I think that we can come up with a
work around for fd 0 should we need to in the future.
The detach case is more problematic: both cgroups and lirc2 verify
that attach_bpf_fd matches the currently attached program. This
way you need access to the program fd to be able to remove it.
Neither sockmap nor flow_dissector do this. flow_dissector even
has a check for CAP_NET_ADMIN because of this. The patch set
addresses this by implementing the desired behaviour.
There is a possibility for user space breakage: any callers that
don't provide the correct fd will fail with ENOENT. For sockmap
the risk is low: even the selftests assume that sockmap works
the way I described. For flow_dissector the story is less
straightforward, and the selftests use a variety of arguments.
I've includes fixes tags for the oldest commits that allow an easy
backport, however the behaviour dates back to when sockmap and
flow_dissector were introduced. What is the best way to handle these?
This set is based on top of Jakub's work "bpf, netns: Prepare
for multi-prog attachment" available at
https://lore.kernel.org/bpf/87k0zwmhtb.fsf@cloudflare.com/T/
Since v1:
- Adjust selftests
- Implement detach behaviour
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Calling bpf_prog_detach is incorrect, since it takes target_fd as
its argument. The intention here is to pass it as attach_bpf_fd,
so use bpf_prog_detach2 and pass zero for target_fd.
Fixes: 06716e04a0 ("selftests/bpf: Extend test_flow_dissector to cover link creation")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200629095630.7933-7-lmb@cloudflare.com
The sockmap code currently ignores the value of attach_bpf_fd when
detaching a program. This is contrary to the usual behaviour of
checking that attach_bpf_fd represents the currently attached
program.
Ensure that attach_bpf_fd is indeed the currently attached
program. It turns out that all sockmap selftests already do this,
which indicates that this is unlikely to cause breakage.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200629095630.7933-5-lmb@cloudflare.com
Using BPF_PROG_DETACH on a flow dissector program supports neither
attach_flags nor attach_bpf_fd. Yet no value is enforced for them.
Enforce that attach_flags are zero, and require the current program
to be passed via attach_bpf_fd. This allows us to remove the check
for CAP_SYS_ADMIN, since userspace can now no longer remove
arbitrary flow dissector programs.
Fixes: b27f7bb590 ("flow_dissector: Move out netns_bpf prog callbacks")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200629095630.7933-3-lmb@cloudflare.com
Using BPF_PROG_ATTACH on a flow dissector program supports neither
target_fd, attach_flags or replace_bpf_fd but accepts any value.
Enforce that all of them are zero. This is fine for replace_bpf_fd
since its presence is indicated by BPF_F_REPLACE. It's more
problematic for target_fd, since zero is a valid fd. Should we
want to use the flag later on we'd have to add an exception for
fd 0. The alternative is to force a value like -1. This requires
more changes to tests. There is also precedent for using 0,
since bpf_iter uses this for target_fd as well.
Fixes: b27f7bb590 ("flow_dissector: Move out netns_bpf prog callbacks")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200629095630.7933-2-lmb@cloudflare.com
Jakub Sitnicki says:
====================
This patch set prepares ground for link-based multi-prog attachment for
future netns attach types, with BPF_SK_LOOKUP attach type in mind [0].
Two changes are needed in order to attach and run a series of BPF programs:
1) an bpf_prog_array of programs to run (patch #2), and
2) a list of attached links to keep track of attachments (patch #3).
Nothing changes for BPF flow_dissector. Just as before only one program can
be attached to netns.
In v3 I've simplified patch #2 that introduces bpf_prog_array to take
advantage of the fact that it will hold at most one program for now.
In particular, I'm no longer using bpf_prog_array_copy. It turned out to be
less suitable for link operations than I thought as it fails to append the
same BPF program.
bpf_prog_array_replace_item is also gone, because we know we always want to
replace the first element in prog_array.
Naturally the code that handles bpf_prog_array will need change once
more when there is a program type that allows multi-prog attachment. But I
feel it will be better to do it gradually and present it together with
tests that actually exercise multi-prog code paths.
[0] https://lore.kernel.org/bpf/20200511185218.1422406-1-jakub@cloudflare.com/
v2 -> v3:
- Don't check if run_array is null in link update callback. (Martin)
- Allow updating the link with the same BPF program. (Andrii)
- Add patch #4 with a test for the above case.
- Kill bpf_prog_array_replace_item. Access the run_array directly.
- Switch from bpf_prog_array_copy() to bpf_prog_array_alloc(1, ...).
- Replace rcu_deref_protected & RCU_INIT_POINTER with rcu_replace_pointer.
- Drop Andrii's Ack from patch #2. Code changed.
v1 -> v2:
- Show with a (void) cast that bpf_prog_array_replace_item() return value
is ignored on purpose. (Andrii)
- Explain why bpf-cgroup cannot replace programs in bpf_prog_array based
on bpf_prog pointer comparison in patch #2 description. (Andrii)
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This case, while not particularly useful, is worth covering because we
expect the operation to succeed as opposed when re-attaching the same
program directly with PROG_ATTACH.
While at it, update the tests summary that fell out of sync when tests
extended to cover links.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200625141357.910330-5-jakub@cloudflare.com
To support multi-prog link-based attachments for new netns attach types, we
need to keep track of more than one bpf_link per attach type. Hence,
convert net->bpf.links into a list, that currently can be either empty or
have just one item.
Instead of reusing bpf_prog_list from bpf-cgroup, we link together
bpf_netns_link's themselves. This makes list management simpler as we don't
have to allocate, initialize, and later release list elements. We can do
this because multi-prog attachment will be available only for bpf_link, and
we don't need to build a list of programs attached directly and indirectly
via links.
No functional changes intended.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200625141357.910330-4-jakub@cloudflare.com
Prepare for having multi-prog attachments for new netns attach types by
storing programs to run in a bpf_prog_array, which is well suited for
iterating over programs and running them in sequence.
After this change bpf(PROG_QUERY) may block to allocate memory in
bpf_prog_array_copy_to_user() for collected program IDs. This forces a
change in how we protect access to the attached program in the query
callback. Because bpf_prog_array_copy_to_user() can sleep, we switch from
an RCU read lock to holding a mutex that serializes updaters.
Because we allow only one BPF flow_dissector program to be attached to
netns at all times, the bpf_prog_array pointed by net->bpf.run_array is
always either detached (null) or one element long.
No functional changes intended.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200625141357.910330-3-jakub@cloudflare.com
Prepare for using bpf_prog_array to store attached programs by moving out
code that updates the attached program out of flow dissector.
Managing bpf_prog_array is more involved than updating a single bpf_prog
pointer. This will let us do it all from one place, bpf/net_namespace.c, in
the subsequent patch.
No functional change intended.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200625141357.910330-2-jakub@cloudflare.com
Tiger Lake's new unique ACPI device ID for Fan is not valid
because of missing 'C' in the ID. Use correct fan device ID.
Fixes: c248dfe7e0 ("ACPI: fan: Add Tiger Lake ACPI device ID")
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Cc: 5.6+ <stable@vger.kernel.org> # 5.6+
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Adjust the reg property to fix the following warning seen with
'make dt_binding_check':
Documentation/devicetree/bindings/thermal/ti,am654-thermal.example.dt.yaml: example-0: thermal@42050000:reg:0: [0, 1107623936, 0, 604] is too long
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Link: https://lore.kernel.org/r/20200630122527.28640-1-festevam@gmail.com
Signed-off-by: Rob Herring <robh@kernel.org>
Remove the soc unit address to fix the following warnings seen with
'make dt_binding_check':
Documentation/devicetree/bindings/thermal/thermal-sensor.example.dts:22.20-49.11: Warning (unit_address_vs_reg): /example-0/soc@0: node has a unit name, but no reg or ranges property
Documentation/devicetree/bindings/thermal/thermal-zones.example.dts:23.20-50.11: Warning (unit_address_vs_reg): /example-0/soc@0: node has a unit name, but no reg or ranges property
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Link: https://lore.kernel.org/r/20200630121804.27887-1-festevam@gmail.com
[robh: also fix thermal-zones.yaml example]
Signed-off-by: Rob Herring <robh@kernel.org>
Remove the leading zeroes to fix the following warning seen with
'make dt_binding_check':
Documentation/devicetree/bindings/usb/aspeed,usb-vhub.example.dts:37.33-42.23: Warning (unit_address_format): /example-0/usb-vhub@1e6a0000/vhub-strings/string@0409: unit name should not have leading 0s
Reviewed-by: Tao Ren <rentao.bupt@gmail.com>
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Link: https://lore.kernel.org/r/20200629214027.16768-1-festevam@gmail.com
Signed-off-by: Rob Herring <robh@kernel.org>
There are two processed schema files:
- processed-schema-examples.yaml
Used for 'make dt_binding_check'. This is always a full schema.
- processed-schema.yaml
Used for 'make dtbs_check'. This may be a full schema, or a smaller
subset if DT_SCHEMA_FILES is given by a user.
If DT_SCHEMA_FILES is not specified, they are the same. You can copy
the former to the latter instead of running dt-mk-schema twice. This
saves the cpu time a lot when you do 'make dt_binding_check dtbs_check'
because building the full schema takes a couple of seconds.
If DT_SCHEMA_FILES is specified, processed-schema.yaml is generated
based on the specified yaml files.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20200625170434.635114-4-masahiroy@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
We are having more and more schema files.
Commit 8b6b80218b ("dt-bindings: Fix command line length limit
calling dt-mk-schema") fixed the 'Argument list too long' error of
the schema checks, but the same error happens while cleaning too.
'make clean' after 'make dt_binding_check' fails as follows:
$ make dt_binding_check
[ snip ]
$ make clean
make[2]: execvp: /bin/sh: Argument list too long
make[2]: *** [scripts/Makefile.clean:52: __clean] Error 127
make[1]: *** [scripts/Makefile.clean:66: Documentation/devicetree/bindings] Error 2
make: *** [Makefile:1763: _clean_Documentation] Error 2
'make dt_binding_check' generates so many .example.dts, .dt.yaml files,
which are passed to the 'rm' command when you run 'make clean'.
I added a small hack to use the 'find' command to clean up most of the
build artifacts before they are processed by scripts/Makefile.clean
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20200625170434.635114-2-masahiroy@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Since commit e69f5dc623 ("dt-bindings: serial: Convert 8250 to
json-schema"), the schema for "ns16550a" is checked.
'make dt_binding_check' emits the following warning:
uart@5,00200000: $nodename:0: 'uart@5,00200000' does not match '^serial(@[0-9a-f,]+)*$'
Rename the node to follow the pattern defined in
Documentation/devicetree/bindings/serial/serial.yaml
While I was here, I removed leading zeros from unit names.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Link: https://lore.kernel.org/r/20200623113242.779241-1-yamada.masahiro@socionext.com
Signed-off-by: Rob Herring <robh@kernel.org>
Sync with upstream dtc primarily to pickup the I2C bus check fixes. The
interrupt_provider check is noisy, so turn it off for now.
This adds the following commits from upstream:
9d7888cbf19c dtc: Consider one-character strings as strings
8259d59f59de checks: Improve i2c reg property checking
fdabcf2980a4 checks: Remove warning for I2C_OWN_SLAVE_ADDRESS
2478b1652c8d libfdt: add extern "C" for C++
f68bfc2668b2 libfdt: trivial typo fix
7be250b4d059 libfdt: Correct condition for reordering blocks
81e0919a3e21 checks: Add interrupt provider test
85e5d839847a Makefile: when building libfdt only, do not add unneeded deps
b28464a550c5 Fix some potential unaligned accesses in dtc
Signed-off-by: Rob Herring <robh@kernel.org>
BPF ringbuf assumes the size to be a multiple of page size and the power of
2 value. The latter is important to avoid division while calculating position
inside the ring buffer and using (N-1) mask instead. This patch fixes omission
to enforce power-of-2 size rule.
Fixes: 457f44363a ("bpf: Implement BPF ring buffer and verifier support for it")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200630061500.1804799-1-andriin@fb.com
Choo! Choo! All aboard the Split Lock Express, with direct service to
Wreckage!
Skip split_lock_verify_msr() if the CPU isn't whitelisted as a possible
SLD-enabled CPU model to avoid writing MSR_TEST_CTRL. MSR_TEST_CTRL
exists, and is writable, on many generations of CPUs. Writing the MSR,
even with '0', can result in bizarre, undocumented behavior.
This fixes a crash on Haswell when resuming from suspend with a live KVM
guest. Because APs use the standard SMP boot flow for resume, they will
go through split_lock_init() and the subsequent RDMSR/WRMSR sequence,
which runs even when sld_state==sld_off to ensure SLD is disabled. On
Haswell (at least, my Haswell), writing MSR_TEST_CTRL with '0' will
succeed and _may_ take the SMT _sibling_ out of VMX root mode.
When KVM has an active guest, KVM performs VMXON as part of CPU onlining
(see kvm_starting_cpu()). Because SMP boot is serialized, the resulting
flow is effectively:
on_each_ap_cpu() {
WRMSR(MSR_TEST_CTRL, 0)
VMXON
}
As a result, the WRMSR can disable VMX on a different CPU that has
already done VMXON. This ultimately results in a #UD on VMPTRLD when
KVM regains control and attempt run its vCPUs.
The above voodoo was confirmed by reworking KVM's VMXON flow to write
MSR_TEST_CTRL prior to VMXON, and to serialize the sequence as above.
Further verification of the insanity was done by redoing VMXON on all
APs after the initial WRMSR->VMXON sequence. The additional VMXON,
which should VM-Fail, occasionally succeeded, and also eliminated the
unexpected #UD on VMPTRLD.
The damage done by writing MSR_TEST_CTRL doesn't appear to be limited
to VMX, e.g. after suspend with an active KVM guest, subsequent reboots
almost always hang (even when fudging VMXON), a #UD on a random Jcc was
observed, suspend/resume stability is qualitatively poor, and so on and
so forth.
kernel BUG at arch/x86/kvm/x86.c:386!
CPU: 1 PID: 2592 Comm: CPU 6/KVM Tainted: G D
Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014
RIP: 0010:kvm_spurious_fault+0xf/0x20
Call Trace:
vmx_vcpu_load_vmcs+0x1fb/0x2b0
vmx_vcpu_load+0x3e/0x160
kvm_arch_vcpu_load+0x48/0x260
finish_task_switch+0x140/0x260
__schedule+0x460/0x720
_cond_resched+0x2d/0x40
kvm_arch_vcpu_ioctl_run+0x82e/0x1ca0
kvm_vcpu_ioctl+0x363/0x5c0
ksys_ioctl+0x88/0xa0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4c/0x170
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: dbaba47085 ("x86/split_lock: Rework the initialization flow of split lock detection")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200605192605.7439-1-sean.j.christopherson@intel.com
Bit 8 would be the "global" bit, which does not quite make sense for non-leaf
page table entries. Intel ignores it; AMD ignores it in PDEs and PDPEs, but
reserves it in PML4Es.
Probably, earlier versions of the AMD manual documented it as reserved in PDPEs
as well, and that behavior made it into KVM as well as kvm-unit-tests; fix it.
Cc: stable@vger.kernel.org
Reported-by: Nadav Amit <namit@vmware.com>
Fixes: a0c0feb579 ("KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD", 2014-09-03)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In flush_delete_work, instead of flushing each individual pending
delayed work item, cancel and re-queue them for immediate execution.
The waiting isn't needed here because we're already waiting for all
queued work items to complete in gfs2_flush_delete_work. This makes the
code more efficient, but more importantly, it avoids sleeping during a
rhashtable walk, inside rcu_read_lock().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Log flush operations (gfs2_log_flush()) can target a specific transaction.
But if the function encounters errors (e.g. io errors) and withdraws,
the transaction was only freed it if was queued to one of the ail lists.
If the withdraw occurred before the transaction was queued to the ail1
list, function ail_drain never freed it. The result was:
BUG gfs2_trans: Objects remaining in gfs2_trans on __kmem_cache_shutdown()
This patch makes log_flush() add the targeted transaction to the ail1
list so that function ail_drain() will find and free it properly.
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Callers expect gfs2_inode_lookup to return an inode pointer or ERR_PTR(error).
Commit b66648ad6d caused it to return NULL instead of ERR_PTR(-ESTALE) in
some cases. Fix that.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: b66648ad6d ("gfs2: Move inode generation number check into gfs2_inode_lookup")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
If NO_DMA=y (e.g. Sun-3 all{mod,yes}-config):
drivers/iommu/dma-iommu.o: In function `iommu_dma_mmap':
dma-iommu.c:(.text+0x92e): undefined reference to `dma_pgprot'
IOMMU_DMA must not be selected, unless HAS_DMA=y.
Hence fix this by making SUN50I_IOMMU depend on HAS_DMA.
Fixes: 4100b8c229 ("iommu: Add Allwinner H6 IOMMU driver")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/r/20200629121146.24011-1-geert@linux-m68k.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Pull irqchip fixes from Marc Zyngier:
- Fix atomicity of affinity update in the GIC driver
- Don't sleep in atomic when waiting for a GICv4.1 RD to respond
- Fix a couple of typos in user-visible messages