| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
mm/filemap: fix filemap_get_folios_contig THP panic
Patch series "memfd-pin huge page fixes".
Fix multiple bugs that occur when using memfd_pin_folios with hugetlb
pages and THP. The hugetlb bugs only bite when the page is not yet
faulted in when memfd_pin_folios is called. The THP bug bites when the
starting offset passed to memfd_pin_folios is not huge page aligned. See
the commit messages for details.
This patch (of 5):
memfd_pin_folios on memory backed by THP panics if the requested start
offset is not huge page aligned:
BUG: kernel NULL pointer dereference, address: 0000000000000036
RIP: 0010:filemap_get_folios_contig+0xdf/0x290
RSP: 0018:ffffc9002092fbe8 EFLAGS: 00010202
RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000002
The fault occurs here, because xas_load returns a folio with value 2:
filemap_get_folios_contig()
for (folio = xas_load(&xas); folio && xas.xa_index <= end;
folio = xas_next(&xas)) {
...
if (!folio_try_get(folio)) <-- BOOM
"2" is an xarray sibling entry. We get it because memfd_pin_folios does
not round the indices passed to filemap_get_folios_contig to huge page
boundaries for THP, so we load from the middle of a huge page range see a
sibling. (It does round for hugetlbfs, at the is_file_hugepages test).
To fix, if the folio is a sibling, then return the next index as the
starting point for the next call to filemap_get_folios_contig. |
| In the Linux kernel, the following vulnerability has been resolved:
mm/gup: fix memfd_pin_folios alloc race panic
If memfd_pin_folios tries to create a hugetlb page, but someone else
already did, then folio gets the value -EEXIST here:
folio = memfd_alloc_folio(memfd, start_idx);
if (IS_ERR(folio)) {
ret = PTR_ERR(folio);
if (ret != -EEXIST)
goto err;
then on the next trip through the "while start_idx" loop we panic here:
if (folio) {
folio_put(folio);
To fix, set the folio to NULL on error. |
| In the Linux kernel, the following vulnerability has been resolved:
btrfs: send: fix buffer overflow detection when copying path to cache entry
Starting with commit c0247d289e73 ("btrfs: send: annotate struct
name_cache_entry with __counted_by()") we annotated the variable length
array "name" from the name_cache_entry structure with __counted_by() to
improve overflow detection. However that alone was not correct, because
the length of that array does not match the "name_len" field - it matches
that plus 1 to include the NUL string terminator, so that makes a
fortified kernel think there's an overflow and report a splat like this:
strcpy: detected buffer overflow: 20 byte write of buffer size 19
WARNING: CPU: 3 PID: 3310 at __fortify_report+0x45/0x50
CPU: 3 UID: 0 PID: 3310 Comm: btrfs Not tainted 6.11.0-prnet #1
Hardware name: CompuLab Ltd. sbc-ihsw/Intense-PC2 (IPC2), BIOS IPC2_3.330.7 X64 03/15/2018
RIP: 0010:__fortify_report+0x45/0x50
Code: 48 8b 34 (...)
RSP: 0018:ffff97ebc0d6f650 EFLAGS: 00010246
RAX: 7749924ef60fa600 RBX: ffff8bf5446a521a RCX: 0000000000000027
RDX: 00000000ffffdfff RSI: ffff97ebc0d6f548 RDI: ffff8bf84e7a1cc8
RBP: ffff8bf548574080 R08: ffffffffa8c40e10 R09: 0000000000005ffd
R10: 0000000000000004 R11: ffffffffa8c70e10 R12: ffff8bf551eef400
R13: 0000000000000000 R14: 0000000000000013 R15: 00000000000003a8
FS: 00007fae144de8c0(0000) GS:ffff8bf84e780000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fae14691690 CR3: 00000001027a2003 CR4: 00000000001706f0
Call Trace:
<TASK>
? __warn+0x12a/0x1d0
? __fortify_report+0x45/0x50
? report_bug+0x154/0x1c0
? handle_bug+0x42/0x70
? exc_invalid_op+0x1a/0x50
? asm_exc_invalid_op+0x1a/0x20
? __fortify_report+0x45/0x50
__fortify_panic+0x9/0x10
__get_cur_name_and_parent+0x3bc/0x3c0
get_cur_path+0x207/0x3b0
send_extent_data+0x709/0x10d0
? find_parent_nodes+0x22df/0x25d0
? mas_nomem+0x13/0x90
? mtree_insert_range+0xa5/0x110
? btrfs_lru_cache_store+0x5f/0x1e0
? iterate_extent_inodes+0x52d/0x5a0
process_extent+0xa96/0x11a0
? __pfx_lookup_backref_cache+0x10/0x10
? __pfx_store_backref_cache+0x10/0x10
? __pfx_iterate_backrefs+0x10/0x10
? __pfx_check_extent_item+0x10/0x10
changed_cb+0x6fa/0x930
? tree_advance+0x362/0x390
? memcmp_extent_buffer+0xd7/0x160
send_subvol+0xf0a/0x1520
btrfs_ioctl_send+0x106b/0x11d0
? __pfx___clone_root_cmp_sort+0x10/0x10
_btrfs_ioctl_send+0x1ac/0x240
btrfs_ioctl+0x75b/0x850
__se_sys_ioctl+0xca/0x150
do_syscall_64+0x85/0x160
? __count_memcg_events+0x69/0x100
? handle_mm_fault+0x1327/0x15c0
? __se_sys_rt_sigprocmask+0xf1/0x180
? syscall_exit_to_user_mode+0x75/0xa0
? do_syscall_64+0x91/0x160
? do_user_addr_fault+0x21d/0x630
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fae145eeb4f
Code: 00 48 89 (...)
RSP: 002b:00007ffdf1cb09b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fae145eeb4f
RDX: 00007ffdf1cb0ad0 RSI: 0000000040489426 RDI: 0000000000000004
RBP: 00000000000078fe R08: 00007fae144006c0 R09: 00007ffdf1cb0927
R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffdf1cb1ce8
R13: 0000000000000003 R14: 000055c499fab2e0 R15: 0000000000000004
</TASK>
Fix this by not storing the NUL string terminator since we don't actually
need it for name cache entries, this way "name_len" corresponds to the
actual size of the "name" array. This requires marking the "name" array
field with __nonstring and using memcpy() instead of strcpy() as
recommended by the guidelines at:
https://github.com/KSPP/linux/issues/90 |
| In the Linux kernel, the following vulnerability has been resolved:
drm/xe/vm: move xa_alloc to prevent UAF
Evil user can guess the next id of the vm before the ioctl completes and
then call vm destroy ioctl to trigger UAF since create ioctl is still
referencing the same vm. Move the xa_alloc all the way to the end to
prevent this.
v2:
- Rebase
(cherry picked from commit dcfd3971327f3ee92765154baebbaece833d3ca9) |
| In the Linux kernel, the following vulnerability has been resolved:
rxrpc: Fix a race between socket set up and I/O thread creation
In rxrpc_open_socket(), it sets up the socket and then sets up the I/O
thread that will handle it. This is a problem, however, as there's a gap
between the two phases in which a packet may come into rxrpc_encap_rcv()
from the UDP packet but we oops when trying to wake the not-yet created I/O
thread.
As a quick fix, just make rxrpc_encap_rcv() discard the packet if there's
no I/O thread yet.
A better, but more intrusive fix would perhaps be to rearrange things such
that the socket creation is done by the I/O thread. |
| In the Linux kernel, the following vulnerability has been resolved:
powercap: intel_rapl: Fix off by one in get_rpi()
The rp->priv->rpi array is either rpi_msr or rpi_tpmi which have
NR_RAPL_PRIMITIVES number of elements. Thus the > needs to be >=
to prevent an off by one access. |
| In the Linux kernel, the following vulnerability has been resolved:
wifi: iwlwifi: mvm: set the cipher for secured NDP ranging
The cipher pointer is not set, but is derefereced trying to set its
content, which leads to a NULL pointer dereference.
Fix it by pointing to the cipher parameter before dereferencing. |
| In the Linux kernel, the following vulnerability has been resolved:
drm/xe/tracing: Fix a potential TP_printk UAF
The commit
afd2627f727b ("tracing: Check "%s" dereference via the field and not the TP_printk format")
exposes potential UAFs in the xe_bo_move trace event.
Fix those by avoiding dereferencing the
xe_mem_type_to_name[] array at TP_printk time.
Since some code refactoring has taken place, explicit backporting may
be needed for kernels older than 6.10. |
| In the Linux kernel, the following vulnerability has been resolved:
media: mediatek: vcodec: Fix H264 stateless decoder smatch warning
Fix a smatch static checker warning on vdec_h264_req_if.c.
Which leads to a kernel crash when fb is NULL. |
| In the Linux kernel, the following vulnerability has been resolved:
fuse: use exclusive lock when FUSE_I_CACHE_IO_MODE is set
This may be a typo. The comment has said shared locks are
not allowed when this bit is set. If using shared lock, the
wait in `fuse_file_cached_io_open` may be forever. |
| In the Linux kernel, the following vulnerability has been resolved:
KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock
Use a dedicated mutex to guard kvm_usage_count to fix a potential deadlock
on x86 due to a chain of locks and SRCU synchronizations. Translating the
below lockdep splat, CPU1 #6 will wait on CPU0 #1, CPU0 #8 will wait on
CPU2 #3, and CPU2 #7 will wait on CPU1 #4 (if there's a writer, due to the
fairness of r/w semaphores).
CPU0 CPU1 CPU2
1 lock(&kvm->slots_lock);
2 lock(&vcpu->mutex);
3 lock(&kvm->srcu);
4 lock(cpu_hotplug_lock);
5 lock(kvm_lock);
6 lock(&kvm->slots_lock);
7 lock(cpu_hotplug_lock);
8 sync(&kvm->srcu);
Note, there are likely more potential deadlocks in KVM x86, e.g. the same
pattern of taking cpu_hotplug_lock outside of kvm_lock likely exists with
__kvmclock_cpufreq_notifier():
cpuhp_cpufreq_online()
|
-> cpufreq_online()
|
-> cpufreq_gov_performance_limits()
|
-> __cpufreq_driver_target()
|
-> __target_index()
|
-> cpufreq_freq_transition_begin()
|
-> cpufreq_notify_transition()
|
-> ... __kvmclock_cpufreq_notifier()
But, actually triggering such deadlocks is beyond rare due to the
combination of dependencies and timings involved. E.g. the cpufreq
notifier is only used on older CPUs without a constant TSC, mucking with
the NX hugepage mitigation while VMs are running is very uncommon, and
doing so while also onlining/offlining a CPU (necessary to generate
contention on cpu_hotplug_lock) would be even more unusual.
The most robust solution to the general cpu_hotplug_lock issue is likely
to switch vm_list to be an RCU-protected list, e.g. so that x86's cpufreq
notifier doesn't to take kvm_lock. For now, settle for fixing the most
blatant deadlock, as switching to an RCU-protected list is a much more
involved change, but add a comment in locking.rst to call out that care
needs to be taken when walking holding kvm_lock and walking vm_list.
======================================================
WARNING: possible circular locking dependency detected
6.10.0-smp--c257535a0c9d-pip #330 Tainted: G S O
------------------------------------------------------
tee/35048 is trying to acquire lock:
ff6a80eced71e0a8 (&kvm->slots_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x179/0x1e0 [kvm]
but task is already holding lock:
ffffffffc07abb08 (kvm_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x14a/0x1e0 [kvm]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (kvm_lock){+.+.}-{3:3}:
__mutex_lock+0x6a/0xb40
mutex_lock_nested+0x1f/0x30
kvm_dev_ioctl+0x4fb/0xe50 [kvm]
__se_sys_ioctl+0x7b/0xd0
__x64_sys_ioctl+0x21/0x30
x64_sys_call+0x15d0/0x2e60
do_syscall_64+0x83/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #2 (cpu_hotplug_lock){++++}-{0:0}:
cpus_read_lock+0x2e/0xb0
static_key_slow_inc+0x16/0x30
kvm_lapic_set_base+0x6a/0x1c0 [kvm]
kvm_set_apic_base+0x8f/0xe0 [kvm]
kvm_set_msr_common+0x9ae/0xf80 [kvm]
vmx_set_msr+0xa54/0xbe0 [kvm_intel]
__kvm_set_msr+0xb6/0x1a0 [kvm]
kvm_arch_vcpu_ioctl+0xeca/0x10c0 [kvm]
kvm_vcpu_ioctl+0x485/0x5b0 [kvm]
__se_sys_ioctl+0x7b/0xd0
__x64_sys_ioctl+0x21/0x30
x64_sys_call+0x15d0/0x2e60
do_syscall_64+0x83/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #1 (&kvm->srcu){.+.+}-{0:0}:
__synchronize_srcu+0x44/0x1a0
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix race setting file private on concurrent lseek using same fd
When doing concurrent lseek(2) system calls against the same file
descriptor, using multiple threads belonging to the same process, we have
a short time window where a race happens and can result in a memory leak.
The race happens like this:
1) A program opens a file descriptor for a file and then spawns two
threads (with the pthreads library for example), lets call them
task A and task B;
2) Task A calls lseek with SEEK_DATA or SEEK_HOLE and ends up at
file.c:find_desired_extent() while holding a read lock on the inode;
3) At the start of find_desired_extent(), it extracts the file's
private_data pointer into a local variable named 'private', which has
a value of NULL;
4) Task B also calls lseek with SEEK_DATA or SEEK_HOLE, locks the inode
in shared mode and enters file.c:find_desired_extent(), where it also
extracts file->private_data into its local variable 'private', which
has a NULL value;
5) Because it saw a NULL file private, task A allocates a private
structure and assigns to the file structure;
6) Task B also saw a NULL file private so it also allocates its own file
private and then assigns it to the same file structure, since both
tasks are using the same file descriptor.
At this point we leak the private structure allocated by task A.
Besides the memory leak, there's also the detail that both tasks end up
using the same cached state record in the private structure (struct
btrfs_file_private::llseek_cached_state), which can result in a
use-after-free problem since one task can free it while the other is
still using it (only one task took a reference count on it). Also, sharing
the cached state is not a good idea since it could result in incorrect
results in the future - right now it should not be a problem because it
end ups being used only in extent-io-tree.c:count_range_bits() where we do
range validation before using the cached state.
Fix this by protecting the private assignment and check of a file while
holding the inode's spinlock and keep track of the task that allocated
the private, so that it's used only by that task in order to prevent
user-after-free issues with the cached state record as well as potentially
using it incorrectly in the future. |
| In the Linux kernel, the following vulnerability has been resolved:
erofs: handle overlapped pclusters out of crafted images properly
syzbot reported a task hang issue due to a deadlock case where it is
waiting for the folio lock of a cached folio that will be used for
cache I/Os.
After looking into the crafted fuzzed image, I found it's formed with
several overlapped big pclusters as below:
Ext: logical offset | length : physical offset | length
0: 0.. 16384 | 16384 : 151552.. 167936 | 16384
1: 16384.. 32768 | 16384 : 155648.. 172032 | 16384
2: 32768.. 49152 | 16384 : 537223168.. 537239552 | 16384
...
Here, extent 0/1 are physically overlapped although it's entirely
_impossible_ for normal filesystem images generated by mkfs.
First, managed folios containing compressed data will be marked as
up-to-date and then unlocked immediately (unlike in-place folios) when
compressed I/Os are complete. If physical blocks are not submitted in
the incremental order, there should be separate BIOs to avoid dependency
issues. However, the current code mis-arranges z_erofs_fill_bio_vec()
and BIO submission which causes unexpected BIO waits.
Second, managed folios will be connected to their own pclusters for
efficient inter-queries. However, this is somewhat hard to implement
easily if overlapped big pclusters exist. Again, these only appear in
fuzzed images so let's simply fall back to temporary short-lived pages
for correctness.
Additionally, it justifies that referenced managed folios cannot be
truncated for now and reverts part of commit 2080ca1ed3e4 ("erofs: tidy
up `struct z_erofs_bvec`") for simplicity although it shouldn't be any
difference. |
| In the Linux kernel, the following vulnerability has been resolved:
netfs: Delete subtree of 'fs/netfs' when netfs module exits
In netfs_init() or fscache_proc_init(), we create dentry under 'fs/netfs',
but in netfs_exit(), we only delete the proc entry of 'fs/netfs' without
deleting its subtree. This triggers the following WARNING:
==================================================================
remove_proc_entry: removing non-empty directory 'fs/netfs', leaking at least 'requests'
WARNING: CPU: 4 PID: 566 at fs/proc/generic.c:717 remove_proc_entry+0x160/0x1c0
Modules linked in: netfs(-)
CPU: 4 UID: 0 PID: 566 Comm: rmmod Not tainted 6.11.0-rc3 #860
RIP: 0010:remove_proc_entry+0x160/0x1c0
Call Trace:
<TASK>
netfs_exit+0x12/0x620 [netfs]
__do_sys_delete_module.isra.0+0x14c/0x2e0
do_syscall_64+0x4b/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
==================================================================
Therefore use remove_proc_subtree() instead of remove_proc_entry() to
fix the above problem. |
| In the Linux kernel, the following vulnerability has been resolved:
crypto: iaa - Fix potential use after free bug
The free_device_compression_mode(iaa_device, device_mode) function frees
"device_mode" but it iss passed to iaa_compression_modes[i]->free() a few
lines later resulting in a use after free.
The good news is that, so far as I can tell, nothing implements the
->free() function and the use after free happens in dead code. But, with
this fix, when something does implement it, we'll be ready. :) |
| In the Linux kernel, the following vulnerability has been resolved:
drm/xe: Use reserved copy engine for user binds on faulting devices
User binds map to engines with can fault, faults depend on user binds
completion, thus we can deadlock. Avoid this by using reserved copy
engine for user binds on faulting devices.
While we are here, normalize bind queue creation with a helper.
v2:
- Pass in extensions to bind queue creation (CI)
v3:
- s/resevered/reserved (Lucas)
- Fix NULL hwe check (Jonathan) |
| In the Linux kernel, the following vulnerability has been resolved:
wifi: ath11k: use work queue to process beacon tx event
Commit 3a415daa3e8b ("wifi: ath11k: add P2P IE in beacon template")
from Feb 28, 2024 (linux-next), leads to the following Smatch static
checker warning:
drivers/net/wireless/ath/ath11k/wmi.c:1742 ath11k_wmi_p2p_go_bcn_ie()
warn: sleeping in atomic context
The reason is that ath11k_bcn_tx_status_event() will directly call might
sleep function ath11k_wmi_cmd_send() during RCU read-side critical
sections. The call trace is like:
ath11k_bcn_tx_status_event()
-> rcu_read_lock()
-> ath11k_mac_bcn_tx_event()
-> ath11k_mac_setup_bcn_tmpl()
……
-> ath11k_wmi_bcn_tmpl()
-> ath11k_wmi_cmd_send()
-> rcu_read_unlock()
Commit 886433a98425 ("ath11k: add support for BSS color change") added the
ath11k_mac_bcn_tx_event(), commit 01e782c89108 ("ath11k: fix warning
of RCU usage for ath11k_mac_get_arvif_by_vdev_id()") added the RCU lock
to avoid warning but also introduced this BUG.
Use work queue to avoid directly calling ath11k_mac_bcn_tx_event()
during RCU critical sections. No need to worry about the deletion of vif
because cancel_work_sync() will drop the work if it doesn't start or
block vif deletion until the running work is done.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30 |
| In the Linux kernel, the following vulnerability has been resolved:
wifi: rtw89: remove unused C2H event ID RTW89_MAC_C2H_FUNC_READ_WOW_CAM to prevent out-of-bounds reading
The handler of firmware C2H event RTW89_MAC_C2H_FUNC_READ_WOW_CAM isn't
implemented, but driver expects number of handlers is
NUM_OF_RTW89_MAC_C2H_FUNC_WOW causing out-of-bounds access. Fix it by
removing ID.
Addresses-Coverity-ID: 1598775 ("Out-of-bounds read") |
| In the Linux kernel, the following vulnerability has been resolved:
iommufd: Protect against overflow of ALIGN() during iova allocation
Userspace can supply an iova and uptr such that the target iova alignment
becomes really big and ALIGN() overflows which corrupts the selected area
range during allocation. CONFIG_IOMMUFD_TEST can detect this:
WARNING: CPU: 1 PID: 5092 at drivers/iommu/iommufd/io_pagetable.c:268 iopt_alloc_area_pages drivers/iommu/iommufd/io_pagetable.c:268 [inline]
WARNING: CPU: 1 PID: 5092 at drivers/iommu/iommufd/io_pagetable.c:268 iopt_map_pages+0xf95/0x1050 drivers/iommu/iommufd/io_pagetable.c:352
Modules linked in:
CPU: 1 PID: 5092 Comm: syz-executor294 Not tainted 6.10.0-rc5-syzkaller-00294-g3ffea9a7a6f7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
RIP: 0010:iopt_alloc_area_pages drivers/iommu/iommufd/io_pagetable.c:268 [inline]
RIP: 0010:iopt_map_pages+0xf95/0x1050 drivers/iommu/iommufd/io_pagetable.c:352
Code: fc e9 a4 f3 ff ff e8 1a 8b 4c fc 41 be e4 ff ff ff e9 8a f3 ff ff e8 0a 8b 4c fc 90 0f 0b 90 e9 37 f5 ff ff e8 fc 8a 4c fc 90 <0f> 0b 90 e9 68 f3 ff ff 48 c7 c1 ec 82 ad 8f 80 e1 07 80 c1 03 38
RSP: 0018:ffffc90003ebf9e0 EFLAGS: 00010293
RAX: ffffffff85499fa4 RBX: 00000000ffffffef RCX: ffff888079b49e00
RDX: 0000000000000000 RSI: 00000000ffffffef RDI: 0000000000000000
RBP: ffffc90003ebfc50 R08: ffffffff85499b30 R09: ffffffff85499942
R10: 0000000000000002 R11: ffff888079b49e00 R12: ffff8880228e0010
R13: 0000000000000000 R14: 1ffff920007d7f68 R15: ffffc90003ebfd00
FS: 000055557d760380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000005fdeb8 CR3: 000000007404a000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
iommufd_ioas_copy+0x610/0x7b0 drivers/iommu/iommufd/ioas.c:274
iommufd_fops_ioctl+0x4d9/0x5a0 drivers/iommu/iommufd/main.c:421
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Cap the automatic alignment to the huge page size, which is probably a
better idea overall. Huge automatic alignments can fragment and chew up
the available IOVA space without any reason. |
| In the Linux kernel, the following vulnerability has been resolved:
RISC-V: KVM: Don't zero-out PMU snapshot area before freeing data
With the latest Linux-6.11-rc3, the below NULL pointer crash is observed
when SBI PMU snapshot is enabled for the guest and the guest is forcefully
powered-off.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000508
Oops [#1]
Modules linked in: kvm
CPU: 0 UID: 0 PID: 61 Comm: term-poll Not tainted 6.11.0-rc3-00018-g44d7178dd77a #3
Hardware name: riscv-virtio,qemu (DT)
epc : __kvm_write_guest_page+0x94/0xa6 [kvm]
ra : __kvm_write_guest_page+0x54/0xa6 [kvm]
epc : ffffffff01590e98 ra : ffffffff01590e58 sp : ffff8f80001f39b0
gp : ffffffff81512a60 tp : ffffaf80024872c0 t0 : ffffaf800247e000
t1 : 00000000000007e0 t2 : 0000000000000000 s0 : ffff8f80001f39f0
s1 : 00007fff89ac4000 a0 : ffffffff015dd7e8 a1 : 0000000000000086
a2 : 0000000000000000 a3 : ffffaf8000000000 a4 : ffffaf80024882c0
a5 : 0000000000000000 a6 : ffffaf800328d780 a7 : 00000000000001cc
s2 : ffffaf800197bd00 s3 : 00000000000828c4 s4 : ffffaf800248c000
s5 : ffffaf800247d000 s6 : 0000000000001000 s7 : 0000000000001000
s8 : 0000000000000000 s9 : 00007fff861fd500 s10: 0000000000000001
s11: 0000000000800000 t3 : 00000000000004d3 t4 : 00000000000004d3
t5 : ffffffff814126e0 t6 : ffffffff81412700
status: 0000000200000120 badaddr: 0000000000000508 cause: 000000000000000d
[<ffffffff01590e98>] __kvm_write_guest_page+0x94/0xa6 [kvm]
[<ffffffff015943a6>] kvm_vcpu_write_guest+0x56/0x90 [kvm]
[<ffffffff015a175c>] kvm_pmu_clear_snapshot_area+0x42/0x7e [kvm]
[<ffffffff015a1972>] kvm_riscv_vcpu_pmu_deinit.part.0+0xe0/0x14e [kvm]
[<ffffffff015a2ad0>] kvm_riscv_vcpu_pmu_deinit+0x1a/0x24 [kvm]
[<ffffffff0159b344>] kvm_arch_vcpu_destroy+0x28/0x4c [kvm]
[<ffffffff0158e420>] kvm_destroy_vcpus+0x5a/0xda [kvm]
[<ffffffff0159930c>] kvm_arch_destroy_vm+0x14/0x28 [kvm]
[<ffffffff01593260>] kvm_destroy_vm+0x168/0x2a0 [kvm]
[<ffffffff015933d4>] kvm_put_kvm+0x3c/0x58 [kvm]
[<ffffffff01593412>] kvm_vm_release+0x22/0x2e [kvm]
Clearly, the kvm_vcpu_write_guest() function is crashing because it is
being called from kvm_pmu_clear_snapshot_area() upon guest tear down.
To address the above issue, simplify the kvm_pmu_clear_snapshot_area() to
not zero-out PMU snapshot area from kvm_pmu_clear_snapshot_area() because
the guest is anyway being tore down.
The kvm_pmu_clear_snapshot_area() is also called when guest changes
PMU snapshot area of a VCPU but even in this case the previous PMU
snaphsot area must not be zeroed-out because the guest might have
reclaimed the pervious PMU snapshot area for some other purpose. |