CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
In the Linux kernel, the following vulnerability has been resolved:
Bluetooth: Fix use-after-free in l2cap_sock_cleanup_listen()
syzbot reported the splat below without a repro.
In the splat, a single thread calling bt_accept_dequeue() freed sk
and touched it after that.
The root cause would be the racy l2cap_sock_cleanup_listen() call
added by the cited commit.
bt_accept_dequeue() is called under lock_sock() except for
l2cap_sock_release().
Two threads could see the same socket during the list iteration
in bt_accept_dequeue():
CPU1 CPU2 (close())
---- ----
sock_hold(sk) sock_hold(sk);
lock_sock(sk) <-- block close()
sock_put(sk)
bt_accept_unlink(sk)
sock_put(sk) <-- refcnt by bt_accept_enqueue()
release_sock(sk)
lock_sock(sk)
sock_put(sk)
bt_accept_unlink(sk)
sock_put(sk) <-- last refcnt
bt_accept_unlink(sk) <-- UAF
Depending on the timing, the other thread could show up in the
"Freed by task" part.
Let's call l2cap_sock_cleanup_listen() under lock_sock() in
l2cap_sock_release().
[0]:
BUG: KASAN: slab-use-after-free in debug_spin_lock_before kernel/locking/spinlock_debug.c:86 [inline]
BUG: KASAN: slab-use-after-free in do_raw_spin_lock+0x26f/0x2b0 kernel/locking/spinlock_debug.c:115
Read of size 4 at addr ffff88803b7eb1c4 by task syz.5.3276/16995
CPU: 3 UID: 0 PID: 16995 Comm: syz.5.3276 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xcd/0x630 mm/kasan/report.c:482
kasan_report+0xe0/0x110 mm/kasan/report.c:595
debug_spin_lock_before kernel/locking/spinlock_debug.c:86 [inline]
do_raw_spin_lock+0x26f/0x2b0 kernel/locking/spinlock_debug.c:115
spin_lock_bh include/linux/spinlock.h:356 [inline]
release_sock+0x21/0x220 net/core/sock.c:3746
bt_accept_dequeue+0x505/0x600 net/bluetooth/af_bluetooth.c:312
l2cap_sock_cleanup_listen+0x5c/0x2a0 net/bluetooth/l2cap_sock.c:1451
l2cap_sock_release+0x5c/0x210 net/bluetooth/l2cap_sock.c:1425
__sock_release+0xb3/0x270 net/socket.c:649
sock_close+0x1c/0x30 net/socket.c:1439
__fput+0x3ff/0xb70 fs/file_table.c:468
task_work_run+0x14d/0x240 kernel/task_work.c:227
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop+0xeb/0x110 kernel/entry/common.c:43
exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
do_syscall_64+0x3f6/0x4c0 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f2accf8ebe9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffdb6cb1378 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4
RAX: 0000000000000000 RBX: 00000000000426fb RCX: 00007f2accf8ebe9
RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003
RBP: 00007f2acd1b7da0 R08: 0000000000000001 R09: 00000012b6cb166f
R10: 0000001b30e20000 R11: 0000000000000246 R12: 00007f2acd1b609c
R13: 00007f2acd1b6090 R14: ffffffffffffffff R15: 00007ffdb6cb1490
</TASK>
Allocated by task 5326:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:388 [inline]
__kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:405
kasan_kmalloc include/linux/kasan.h:260 [inline]
__do_kmalloc_node mm/slub.c:4365 [inline]
__kmalloc_nopro
---truncated--- |
In the Linux kernel, the following vulnerability has been resolved:
cifs: prevent NULL pointer dereference in UTF16 conversion
There can be a NULL pointer dereference bug here. NULL is passed to
__cifs_sfu_make_node without checks, which passes it unchecked to
cifs_strndup_to_utf16, which in turn passes it to
cifs_local_to_utf16_bytes where '*from' is dereferenced, causing a crash.
This patch adds a check for NULL 'src' in cifs_strndup_to_utf16 and
returns NULL early to prevent dereferencing NULL pointer.
Found by Linux Verification Center (linuxtesting.org) with SVACE |
In the Linux kernel, the following vulnerability has been resolved:
scsi: lpfc: Fix buffer free/clear order in deferred receive path
Fix a use-after-free window by correcting the buffer release sequence in
the deferred receive path. The code freed the RQ buffer first and only
then cleared the context pointer under the lock. Concurrent paths (e.g.,
ABTS and the repost path) also inspect and release the same pointer under
the lock, so the old order could lead to double-free/UAF.
Note that the repost path already uses the correct pattern: detach the
pointer under the lock, then free it after dropping the lock. The
deferred path should do the same. |
In the Linux kernel, the following vulnerability has been resolved:
batman-adv: fix OOB read/write in network-coding decode
batadv_nc_skb_decode_packet() trusts coded_len and checks only against
skb->len. XOR starts at sizeof(struct batadv_unicast_packet), reducing
payload headroom, and the source skb length is not verified, allowing an
out-of-bounds read and a small out-of-bounds write.
Validate that coded_len fits within the payload area of both destination
and source sk_buffs before XORing. |
In the Linux kernel, the following vulnerability has been resolved:
mm: move page table sync declarations to linux/pgtable.h
During our internal testing, we started observing intermittent boot
failures when the machine uses 4-level paging and has a large amount of
persistent memory:
BUG: unable to handle page fault for address: ffffe70000000034
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP NOPTI
RIP: 0010:__init_single_page+0x9/0x6d
Call Trace:
<TASK>
__init_zone_device_page+0x17/0x5d
memmap_init_zone_device+0x154/0x1bb
pagemap_range+0x2e0/0x40f
memremap_pages+0x10b/0x2f0
devm_memremap_pages+0x1e/0x60
dev_dax_probe+0xce/0x2ec [device_dax]
dax_bus_probe+0x6d/0xc9
[... snip ...]
</TASK>
It turns out that the kernel panics while initializing vmemmap (struct
page array) when the vmemmap region spans two PGD entries, because the new
PGD entry is only installed in init_mm.pgd, but not in the page tables of
other tasks.
And looking at __populate_section_memmap():
if (vmemmap_can_optimize(altmap, pgmap))
// does not sync top level page tables
r = vmemmap_populate_compound_pages(pfn, start, end, nid, pgmap);
else
// sync top level page tables in x86
r = vmemmap_populate(start, end, nid, altmap);
In the normal path, vmemmap_populate() in arch/x86/mm/init_64.c
synchronizes the top level page table (See commit 9b861528a801 ("x86-64,
mem: Update all PGDs for direct mapping and vmemmap mapping changes")) so
that all tasks in the system can see the new vmemmap area.
However, when vmemmap_can_optimize() returns true, the optimized path
skips synchronization of top-level page tables. This is because
vmemmap_populate_compound_pages() is implemented in core MM code, which
does not handle synchronization of the top-level page tables. Instead,
the core MM has historically relied on each architecture to perform this
synchronization manually.
We're not the first party to encounter a crash caused by not-sync'd top
level page tables: earlier this year, Gwan-gyeong Mun attempted to address
the issue [1] [2] after hitting a kernel panic when x86 code accessed the
vmemmap area before the corresponding top-level entries were synced. At
that time, the issue was believed to be triggered only when struct page
was enlarged for debugging purposes, and the patch did not get further
updates.
It turns out that current approach of relying on each arch to handle the
page table sync manually is fragile because 1) it's easy to forget to sync
the top level page table, and 2) it's also easy to overlook that the
kernel should not access the vmemmap and direct mapping areas before the
sync.
# The solution: Make page table sync more code robust and harder to miss
To address this, Dave Hansen suggested [3] [4] introducing
{pgd,p4d}_populate_kernel() for updating kernel portion of the page tables
and allow each architecture to explicitly perform synchronization when
installing top-level entries. With this approach, we no longer need to
worry about missing the sync step, reducing the risk of future
regressions.
The new interface reuses existing ARCH_PAGE_TABLE_SYNC_MASK,
PGTBL_P*D_MODIFIED and arch_sync_kernel_mappings() facility used by
vmalloc and ioremap to synchronize page tables.
pgd_populate_kernel() looks like this:
static inline void pgd_populate_kernel(unsigned long addr, pgd_t *pgd,
p4d_t *p4d)
{
pgd_populate(&init_mm, pgd, p4d);
if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PGD_MODIFIED)
arch_sync_kernel_mappings(addr, addr);
}
It is worth noting that vmalloc() and apply_to_range() carefully
synchronizes page tables by calling p*d_alloc_track() and
arch_sync_kernel_mappings(), and thus they are not affected by
---truncated--- |
In the Linux kernel, the following vulnerability has been resolved:
wifi: cfg80211: fix use-after-free in cmp_bss()
Following bss_free() quirk introduced in commit 776b3580178f
("cfg80211: track hidden SSID networks properly"), adjust
cfg80211_update_known_bss() to free the last beacon frame
elements only if they're not shared via the corresponding
'hidden_beacon_bss' pointer. |
In the Linux kernel, the following vulnerability has been resolved:
ice: fix NULL access of tx->in_use in ice_ptp_ts_irq
The E810 device has support for a "low latency" firmware interface to
access and read the Tx timestamps. This interface does not use the standard
Tx timestamp logic, due to the latency overhead of proxying sideband
command requests over the firmware AdminQ.
The logic still makes use of the Tx timestamp tracking structure,
ice_ptp_tx, as it uses the same "ready" bitmap to track which Tx
timestamps complete.
Unfortunately, the ice_ptp_ts_irq() function does not check if the tracker
is initialized before its first access. This results in NULL dereference or
use-after-free bugs similar to the following:
[245977.278756] BUG: kernel NULL pointer dereference, address: 0000000000000000
[245977.278774] RIP: 0010:_find_first_bit+0x19/0x40
[245977.278796] Call Trace:
[245977.278809] ? ice_misc_intr+0x364/0x380 [ice]
This can occur if a Tx timestamp interrupt races with the driver reset
logic.
Fix this by only checking the in_use bitmap (and other fields) if the
tracker is marked as initialized. The reset flow will clear the init field
under lock before it tears the tracker down, thus preventing any
use-after-free or NULL access. |
In the Linux kernel, the following vulnerability has been resolved:
ptp: ocp: fix use-after-free bugs causing by ptp_ocp_watchdog
The ptp_ocp_detach() only shuts down the watchdog timer if it is
pending. However, if the timer handler is already running, the
timer_delete_sync() is not called. This leads to race conditions
where the devlink that contains the ptp_ocp is deallocated while
the timer handler is still accessing it, resulting in use-after-free
bugs. The following details one of the race scenarios.
(thread 1) | (thread 2)
ptp_ocp_remove() |
ptp_ocp_detach() | ptp_ocp_watchdog()
if (timer_pending(&bp->watchdog))| bp = timer_container_of()
timer_delete_sync() |
|
devlink_free(devlink) //free |
| bp-> //use
Resolve this by unconditionally calling timer_delete_sync() to ensure
the timer is reliably deactivated, preventing any access after free. |
In the Linux kernel, the following vulnerability has been resolved:
fs: writeback: fix use-after-free in __mark_inode_dirty()
An use-after-free issue occurred when __mark_inode_dirty() get the
bdi_writeback that was in the progress of switching.
CPU: 1 PID: 562 Comm: systemd-random- Not tainted 6.6.56-gb4403bd46a8e #1
......
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __mark_inode_dirty+0x124/0x418
lr : __mark_inode_dirty+0x118/0x418
sp : ffffffc08c9dbbc0
........
Call trace:
__mark_inode_dirty+0x124/0x418
generic_update_time+0x4c/0x60
file_modified+0xcc/0xd0
ext4_buffered_write_iter+0x58/0x124
ext4_file_write_iter+0x54/0x704
vfs_write+0x1c0/0x308
ksys_write+0x74/0x10c
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x48/0x114
el0_svc_common.constprop.0+0xc0/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x40/0xe4
el0t_64_sync_handler+0x120/0x12c
el0t_64_sync+0x194/0x198
Root cause is:
systemd-random-seed kworker
----------------------------------------------------------------------
___mark_inode_dirty inode_switch_wbs_work_fn
spin_lock(&inode->i_lock);
inode_attach_wb
locked_inode_to_wb_and_lock_list
get inode->i_wb
spin_unlock(&inode->i_lock);
spin_lock(&wb->list_lock)
spin_lock(&inode->i_lock)
inode_io_list_move_locked
spin_unlock(&wb->list_lock)
spin_unlock(&inode->i_lock)
spin_lock(&old_wb->list_lock)
inode_do_switch_wbs
spin_lock(&inode->i_lock)
inode->i_wb = new_wb
spin_unlock(&inode->i_lock)
spin_unlock(&old_wb->list_lock)
wb_put_many(old_wb, nr_switched)
cgwb_release
old wb released
wb_wakeup_delayed() accesses wb,
then trigger the use-after-free
issue
Fix this race condition by holding inode spinlock until
wb_wakeup_delayed() finished. |
In the Linux kernel, the following vulnerability has been resolved:
ice: fix NULL access of tx->in_use in ice_ll_ts_intr
Recent versions of the E810 firmware have support for an extra interrupt to
handle report of the "low latency" Tx timestamps coming from the
specialized low latency firmware interface. Instead of polling the
registers, software can wait until the low latency interrupt is fired.
This logic makes use of the Tx timestamp tracking structure, ice_ptp_tx, as
it uses the same "ready" bitmap to track which Tx timestamps complete.
Unfortunately, the ice_ll_ts_intr() function does not check if the
tracker is initialized before its first access. This results in NULL
dereference or use-after-free bugs similar to the issues fixed in the
ice_ptp_ts_irq() function.
Fix this by only checking the in_use bitmap (and other fields) if the
tracker is marked as initialized. The reset flow will clear the init field
under lock before it tears the tracker down, thus preventing any
use-after-free or NULL access. |
In the Linux kernel, the following vulnerability has been resolved:
ALSA: firewire-lib: fix uninitialized flag for AV/C deferred transaction
AV/C deferred transaction was supported at a commit 00a7bb81c20f ("ALSA:
firewire-lib: Add support for deferred transaction") while 'deferrable'
flag can be uninitialized for non-control/notify AV/C transactions.
UBSAN reports it:
kernel: ================================================================================
kernel: UBSAN: invalid-load in /build/linux-aa0B4d/linux-5.15.0/sound/firewire/fcp.c:363:9
kernel: load of value 158 is not a valid value for type '_Bool'
kernel: CPU: 3 PID: 182227 Comm: irq/35-firewire Tainted: P OE 5.15.0-18-generic #18-Ubuntu
kernel: Hardware name: Gigabyte Technology Co., Ltd. AX370-Gaming 5/AX370-Gaming 5, BIOS F42b 08/01/2019
kernel: Call Trace:
kernel: <IRQ>
kernel: show_stack+0x52/0x58
kernel: dump_stack_lvl+0x4a/0x5f
kernel: dump_stack+0x10/0x12
kernel: ubsan_epilogue+0x9/0x45
kernel: __ubsan_handle_load_invalid_value.cold+0x44/0x49
kernel: fcp_response.part.0.cold+0x1a/0x2b [snd_firewire_lib]
kernel: fcp_response+0x28/0x30 [snd_firewire_lib]
kernel: fw_core_handle_request+0x230/0x3d0 [firewire_core]
kernel: handle_ar_packet+0x1d9/0x200 [firewire_ohci]
kernel: ? handle_ar_packet+0x1d9/0x200 [firewire_ohci]
kernel: ? transmit_complete_callback+0x9f/0x120 [firewire_core]
kernel: ar_context_tasklet+0xa8/0x2e0 [firewire_ohci]
kernel: tasklet_action_common.constprop.0+0xea/0xf0
kernel: tasklet_action+0x22/0x30
kernel: __do_softirq+0xd9/0x2e3
kernel: ? irq_finalize_oneshot.part.0+0xf0/0xf0
kernel: do_softirq+0x75/0xa0
kernel: </IRQ>
kernel: <TASK>
kernel: __local_bh_enable_ip+0x50/0x60
kernel: irq_forced_thread_fn+0x7e/0x90
kernel: irq_thread+0xba/0x190
kernel: ? irq_thread_fn+0x60/0x60
kernel: kthread+0x11e/0x140
kernel: ? irq_thread_check_affinity+0xf0/0xf0
kernel: ? set_kthread_struct+0x50/0x50
kernel: ret_from_fork+0x22/0x30
kernel: </TASK>
kernel: ================================================================================
This commit fixes the bug. The bug has no disadvantage for the non-
control/notify AV/C transactions since the flag has an effect for AV/C
response with INTERIM (0x0f) status which is not used for the transactions
in AV/C general specification. |
In the Linux kernel, the following vulnerability has been resolved:
ASoC: atmel: Fix error handling in snd_proto_probe
The device_node pointer is returned by of_parse_phandle() with refcount
incremented. We should use of_node_put() on it when done.
This function only calls of_node_put() in the regular path.
And it will cause refcount leak in error paths.
Fix this by calling of_node_put() in error handling too. |
In the Linux kernel, the following vulnerability has been resolved:
ASoC: rockchip: Fix PM usage reference of rockchip_i2s_tdm_resume
pm_runtime_get_sync will increment pm usage counter
even it failed. Forgetting to putting operation will
result in reference leak here. We fix it by replacing
it with pm_runtime_resume_and_get to keep usage counter
balanced. |
In the Linux kernel, the following vulnerability has been resolved:
ASoC: mediatek: mt8192-mt6359: Fix error handling in mt8192_mt6359_dev_probe
The device_node pointer is returned by of_parse_phandle() with refcount
incremented. We should use of_node_put() on it when done.
This function only calls of_node_put() in the regular path.
And it will cause refcount leak in error paths.
Fix this by calling of_node_put() in error handling too. |
In the Linux kernel, the following vulnerability has been resolved:
drm/msm/a3xx: fix error handling in a3xx_gpu_init()
These error paths returned 1 on failure, instead of a negative error
code. This would lead to an Oops in the caller. A second problem is
that the check for "if (ret != -ENODATA)" did not work because "ret" was
set to 1. |
In the Linux kernel, the following vulnerability has been resolved:
net: dsa: Avoid cross-chip syncing of VLAN filtering
Changes to VLAN filtering are not applicable to cross-chip
notifications.
On a system like this:
.-----. .-----. .-----.
| sw1 +---+ sw2 +---+ sw3 |
'-1-2-' '-1-2-' '-1-2-'
Before this change, upon sw1p1 leaving a bridge, a call to
dsa_port_vlan_filtering would also be made to sw2p1 and sw3p1.
In this scenario:
.---------. .-----. .-----.
| sw1 +---+ sw2 +---+ sw3 |
'-1-2-3-4-' '-1-2-' '-1-2-'
When sw1p4 would leave a bridge, dsa_port_vlan_filtering would be
called for sw2 and sw3 with a non-existing port - leading to array
out-of-bounds accesses and crashes on mv88e6xxx. |
In the Linux kernel, the following vulnerability has been resolved:
bpf: Fix a btf decl_tag bug when tagging a function
syzbot reported a btf decl_tag bug with stack trace below:
general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 0 PID: 3592 Comm: syz-executor914 Not tainted 5.16.0-syzkaller-11424-gb7892f7d5cb2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:btf_type_vlen include/linux/btf.h:231 [inline]
RIP: 0010:btf_decl_tag_resolve+0x83e/0xaa0 kernel/bpf/btf.c:3910
...
Call Trace:
<TASK>
btf_resolve+0x251/0x1020 kernel/bpf/btf.c:4198
btf_check_all_types kernel/bpf/btf.c:4239 [inline]
btf_parse_type_sec kernel/bpf/btf.c:4280 [inline]
btf_parse kernel/bpf/btf.c:4513 [inline]
btf_new_fd+0x19fe/0x2370 kernel/bpf/btf.c:6047
bpf_btf_load kernel/bpf/syscall.c:4039 [inline]
__sys_bpf+0x1cbb/0x5970 kernel/bpf/syscall.c:4679
__do_sys_bpf kernel/bpf/syscall.c:4738 [inline]
__se_sys_bpf kernel/bpf/syscall.c:4736 [inline]
__x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4736
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
The kasan error is triggered with an illegal BTF like below:
type 0: void
type 1: int
type 2: decl_tag to func type 3
type 3: func to func_proto type 8
The total number of types is 4 and the type 3 is illegal
since its func_proto type is out of range.
Currently, the target type of decl_tag can be struct/union, var or func.
Both struct/union and var implemented their own 'resolve' callback functions
and hence handled properly in kernel.
But func type doesn't have 'resolve' callback function. When
btf_decl_tag_resolve() tries to check func type, it tries to get
vlen of its func_proto type, which triggered the above kasan error.
To fix the issue, btf_decl_tag_resolve() needs to do btf_func_check()
before trying to accessing func_proto type.
In the current implementation, func type is checked with
btf_func_check() in the main checking function btf_check_all_types().
To fix the above kasan issue, let us implement 'resolve' callback
func type properly. The 'resolve' callback will be also called
in btf_check_all_types() for func types. |
In the Linux kernel, the following vulnerability has been resolved:
drm/bridge: anx7625: Fix overflow issue on reading EDID
The length of EDID block can be longer than 256 bytes, so we should use
`int` instead of `u8` for the `edid_pos` variable. |
In the Linux kernel, the following vulnerability has been resolved:
mptcp: fix possible stall on recvmsg()
recvmsg() can enter an infinite loop if the caller provides the
MSG_WAITALL, the data present in the receive queue is not sufficient to
fulfill the request, and no more data is received by the peer.
When the above happens, mptcp_wait_data() will always return with
no wait, as the MPTCP_DATA_READY flag checked by such function is
set and never cleared in such code path.
Leveraging the above syzbot was able to trigger an RCU stall:
rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 0-...!: (10499 ticks this GP) idle=0af/1/0x4000000000000000 softirq=10678/10678 fqs=1
(t=10500 jiffies g=13089 q=109)
rcu: rcu_preempt kthread starved for 10497 jiffies! g13089 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:R running task stack:28696 pid: 14 ppid: 2 flags:0x00004000
Call Trace:
context_switch kernel/sched/core.c:4955 [inline]
__schedule+0x940/0x26f0 kernel/sched/core.c:6236
schedule+0xd3/0x270 kernel/sched/core.c:6315
schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881
rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1955
rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2128
kthread+0x405/0x4f0 kernel/kthread.c:327
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 8510 Comm: syz-executor827 Not tainted 5.15.0-rc2-next-20210920-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:84 [inline]
RIP: 0010:memory_is_nonzero mm/kasan/generic.c:102 [inline]
RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:128 [inline]
RIP: 0010:memory_is_poisoned mm/kasan/generic.c:159 [inline]
RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
RIP: 0010:kasan_check_range+0xc8/0x180 mm/kasan/generic.c:189
Code: 38 00 74 ed 48 8d 50 08 eb 09 48 83 c0 01 48 39 d0 74 7a 80 38 00 74 f2 48 89 c2 b8 01 00 00 00 48 85 d2 75 56 5b 5d 41 5c c3 <48> 85 d2 74 5e 48 01 ea eb 09 48 83 c0 01 48 39 d0 74 50 80 38 00
RSP: 0018:ffffc9000cd676c8 EFLAGS: 00000283
RAX: ffffed100e9a110e RBX: ffffed100e9a110f RCX: ffffffff88ea062a
RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff888074d08870
RBP: ffffed100e9a110e R08: 0000000000000001 R09: ffff888074d08877
R10: ffffed100e9a110e R11: 0000000000000000 R12: ffff888074d08000
R13: ffff888074d08000 R14: ffff888074d08088 R15: ffff888074d08000
FS: 0000555556d8e300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
S: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000180 CR3: 0000000068909000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
test_and_clear_bit include/asm-generic/bitops/instrumented-atomic.h:83 [inline]
mptcp_release_cb+0x14a/0x210 net/mptcp/protocol.c:3016
release_sock+0xb4/0x1b0 net/core/sock.c:3204
mptcp_wait_data net/mptcp/protocol.c:1770 [inline]
mptcp_recvmsg+0xfd1/0x27b0 net/mptcp/protocol.c:2080
inet6_recvmsg+0x11b/0x5e0 net/ipv6/af_inet6.c:659
sock_recvmsg_nosec net/socket.c:944 [inline]
____sys_recvmsg+0x527/0x600 net/socket.c:2626
___sys_recvmsg+0x127/0x200 net/socket.c:2670
do_recvmmsg+0x24d/0x6d0 net/socket.c:2764
__sys_recvmmsg net/socket.c:2843 [inline]
__do_sys_recvmmsg net/socket.c:2866 [inline]
__se_sys_recvmmsg net/socket.c:2859 [inline]
__x64_sys_recvmmsg+0x20b/0x260 net/socket.c:2859
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc200d2
---truncated--- |
In the Linux kernel, the following vulnerability has been resolved:
powerpc/64s: Don't use DSISR for SLB faults
Since commit 46ddcb3950a2 ("powerpc/mm: Show if a bad page fault on data
is read or write.") we use page_fault_is_write(regs->dsisr) in
__bad_page_fault() to determine if the fault is for a read or write, and
change the message printed accordingly.
But SLB faults, aka Data Segment Interrupts, don't set DSISR (Data
Storage Interrupt Status Register) to a useful value. All ISA versions
from v2.03 through v3.1 specify that the Data Segment Interrupt sets
DSISR "to an undefined value". As far as I can see there's no mention of
SLB faults setting DSISR in any BookIV content either.
This manifests as accesses that should be a read being incorrectly
reported as writes, for example, using the xmon "dump" command:
0:mon> d 0x5deadbeef0000000
5deadbeef0000000
[359526.415354][ C6] BUG: Unable to handle kernel data access on write at 0x5deadbeef0000000
[359526.415611][ C6] Faulting instruction address: 0xc00000000010a300
cpu 0x6: Vector: 380 (Data SLB Access) at [c00000000ffbf400]
pc: c00000000010a300: mread+0x90/0x190
If we disassemble the PC, we see a load instruction:
0:mon> di c00000000010a300
c00000000010a300 89490000 lbz r10,0(r9)
We can also see in exceptions-64s.S that the data_access_slb block
doesn't set IDSISR=1, which means it doesn't load DSISR into pt_regs. So
the value we're using to determine if the fault is a read/write is some
stale value in pt_regs from a previous page fault.
Rework the printing logic to separate the SLB fault case out, and only
print read/write in the cases where we can determine it.
The result looks like eg:
0:mon> d 0x5deadbeef0000000
5deadbeef0000000
[ 721.779525][ C6] BUG: Unable to handle kernel data access at 0x5deadbeef0000000
[ 721.779697][ C6] Faulting instruction address: 0xc00000000014cbe0
cpu 0x6: Vector: 380 (Data SLB Access) at [c00000000ffbf390]
0:mon> d 0
0000000000000000
[ 742.793242][ C6] BUG: Kernel NULL pointer dereference at 0x00000000
[ 742.793316][ C6] Faulting instruction address: 0xc00000000014cbe0
cpu 0x6: Vector: 380 (Data SLB Access) at [c00000000ffbf390] |