CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
An issue was discovered in drivers/net/ethernet/intel/igb/igb_main.c in the IGB driver in the Linux kernel before 6.5.3. A buffer size may not be adequate for frames larger than the MTU. |
In the Linux kernel, the following vulnerability has been resolved:
drm/dp_mst: Fix resetting msg rx state after topology removal
If the MST topology is removed during the reception of an MST down reply
or MST up request sideband message, the
drm_dp_mst_topology_mgr::up_req_recv/down_rep_recv states could be reset
from one thread via drm_dp_mst_topology_mgr_set_mst(false), racing with
the reading/parsing of the message from another thread via
drm_dp_mst_handle_down_rep() or drm_dp_mst_handle_up_req(). The race is
possible since the reader/parser doesn't hold any lock while accessing
the reception state. This in turn can lead to a memory corruption in the
reader/parser as described by commit bd2fccac61b4 ("drm/dp_mst: Fix MST
sideband message body length check").
Fix the above by resetting the message reception state if needed before
reading/parsing a message. Another solution would be to hold the
drm_dp_mst_topology_mgr::lock for the whole duration of the message
reception/parsing in drm_dp_mst_handle_down_rep() and
drm_dp_mst_handle_up_req(), however this would require a bigger change.
Since the fix is also needed for stable, opting for the simpler solution
in this patch. |
In the Linux kernel, the following vulnerability has been resolved:
net: avoid potential underflow in qdisc_pkt_len_init() with UFO
After commit 7c6d2ecbda83 ("net: be more gentle about silly gso
requests coming from user") virtio_net_hdr_to_skb() had sanity check
to detect malicious attempts from user space to cook a bad GSO packet.
Then commit cf9acc90c80ec ("net: virtio_net_hdr_to_skb: count
transport header in UFO") while fixing one issue, allowed user space
to cook a GSO packet with the following characteristic :
IPv4 SKB_GSO_UDP, gso_size=3, skb->len = 28.
When this packet arrives in qdisc_pkt_len_init(), we end up
with hdr_len = 28 (IPv4 header + UDP header), matching skb->len
Then the following sets gso_segs to 0 :
gso_segs = DIV_ROUND_UP(skb->len - hdr_len,
shinfo->gso_size);
Then later we set qdisc_skb_cb(skb)->pkt_len to back to zero :/
qdisc_skb_cb(skb)->pkt_len += (gso_segs - 1) * hdr_len;
This leads to the following crash in fq_codel [1]
qdisc_pkt_len_init() is best effort, we only want an estimation
of the bytes sent on the wire, not crashing the kernel.
This patch is fixing this particular issue, a following one
adds more sanity checks for another potential bug.
[1]
[ 70.724101] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 70.724561] #PF: supervisor read access in kernel mode
[ 70.724561] #PF: error_code(0x0000) - not-present page
[ 70.724561] PGD 10ac61067 P4D 10ac61067 PUD 107ee2067 PMD 0
[ 70.724561] Oops: Oops: 0000 [#1] SMP NOPTI
[ 70.724561] CPU: 11 UID: 0 PID: 2163 Comm: b358537762 Not tainted 6.11.0-virtme #991
[ 70.724561] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 70.724561] RIP: 0010:fq_codel_enqueue (net/sched/sch_fq_codel.c:120 net/sched/sch_fq_codel.c:168 net/sched/sch_fq_codel.c:230) sch_fq_codel
[ 70.724561] Code: 24 08 49 c1 e1 06 44 89 7c 24 18 45 31 ed 45 31 c0 31 ff 89 44 24 14 4c 03 8b 90 01 00 00 eb 04 39 ca 73 37 4d 8b 39 83 c7 01 <49> 8b 17 49 89 11 41 8b 57 28 45 8b 5f 34 49 c7 07 00 00 00 00 49
All code
========
0: 24 08 and $0x8,%al
2: 49 c1 e1 06 shl $0x6,%r9
6: 44 89 7c 24 18 mov %r15d,0x18(%rsp)
b: 45 31 ed xor %r13d,%r13d
e: 45 31 c0 xor %r8d,%r8d
11: 31 ff xor %edi,%edi
13: 89 44 24 14 mov %eax,0x14(%rsp)
17: 4c 03 8b 90 01 00 00 add 0x190(%rbx),%r9
1e: eb 04 jmp 0x24
20: 39 ca cmp %ecx,%edx
22: 73 37 jae 0x5b
24: 4d 8b 39 mov (%r9),%r15
27: 83 c7 01 add $0x1,%edi
2a:* 49 8b 17 mov (%r15),%rdx <-- trapping instruction
2d: 49 89 11 mov %rdx,(%r9)
30: 41 8b 57 28 mov 0x28(%r15),%edx
34: 45 8b 5f 34 mov 0x34(%r15),%r11d
38: 49 c7 07 00 00 00 00 movq $0x0,(%r15)
3f: 49 rex.WB
Code starting with the faulting instruction
===========================================
0: 49 8b 17 mov (%r15),%rdx
3: 49 89 11 mov %rdx,(%r9)
6: 41 8b 57 28 mov 0x28(%r15),%edx
a: 45 8b 5f 34 mov 0x34(%r15),%r11d
e: 49 c7 07 00 00 00 00 movq $0x0,(%r15)
15: 49 rex.WB
[ 70.724561] RSP: 0018:ffff95ae85e6fb90 EFLAGS: 00000202
[ 70.724561] RAX: 0000000002000000 RBX: ffff95ae841de000 RCX: 0000000000000000
[ 70.724561] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
[ 70.724561] RBP: ffff95ae85e6fbf8 R08: 0000000000000000 R09: ffff95b710a30000
[ 70.724561] R10: 0000000000000000 R11: bdf289445ce31881 R12: ffff95ae85e6fc58
[ 70.724561] R13: 0000000000000000 R14: 0000000000000040 R15: 0000000000000000
[ 70.724561] FS: 000000002c5c1380(0000) GS:ffff95bd7fcc0000(0000) knlGS:0000000000000000
[ 70.724561] CS: 0010 DS: 0000 ES: 0000 C
---truncated--- |
In the Linux kernel, the following vulnerability has been resolved:
memcg: protect concurrent access to mem_cgroup_idr
Commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after
many small jobs") decoupled the memcg IDs from the CSS ID space to fix the
cgroup creation failures. It introduced IDR to maintain the memcg ID
space. The IDR depends on external synchronization mechanisms for
modifications. For the mem_cgroup_idr, the idr_alloc() and idr_replace()
happen within css callback and thus are protected through cgroup_mutex
from concurrent modifications. However idr_remove() for mem_cgroup_idr
was not protected against concurrency and can be run concurrently for
different memcgs when they hit their refcnt to zero. Fix that.
We have been seeing list_lru based kernel crashes at a low frequency in
our fleet for a long time. These crashes were in different part of
list_lru code including list_lru_add(), list_lru_del() and reparenting
code. Upon further inspection, it looked like for a given object (dentry
and inode), the super_block's list_lru didn't have list_lru_one for the
memcg of that object. The initial suspicions were either the object is
not allocated through kmem_cache_alloc_lru() or somehow
memcg_list_lru_alloc() failed to allocate list_lru_one() for a memcg but
returned success. No evidence were found for these cases.
Looking more deeply, we started seeing situations where valid memcg's id
is not present in mem_cgroup_idr and in some cases multiple valid memcgs
have same id and mem_cgroup_idr is pointing to one of them. So, the most
reasonable explanation is that these situations can happen due to race
between multiple idr_remove() calls or race between
idr_alloc()/idr_replace() and idr_remove(). These races are causing
multiple memcgs to acquire the same ID and then offlining of one of them
would cleanup list_lrus on the system for all of them. Later access from
other memcgs to the list_lru cause crashes due to missing list_lru_one. |
In the Linux kernel, the following vulnerability has been resolved:
sched: act_ct: take care of padding in struct zones_ht_key
Blamed commit increased lookup key size from 2 bytes to 16 bytes,
because zones_ht_key got a struct net pointer.
Make sure rhashtable_lookup() is not using the padding bytes
which are not initialized.
BUG: KMSAN: uninit-value in rht_ptr_rcu include/linux/rhashtable.h:376 [inline]
BUG: KMSAN: uninit-value in __rhashtable_lookup include/linux/rhashtable.h:607 [inline]
BUG: KMSAN: uninit-value in rhashtable_lookup include/linux/rhashtable.h:646 [inline]
BUG: KMSAN: uninit-value in rhashtable_lookup_fast include/linux/rhashtable.h:672 [inline]
BUG: KMSAN: uninit-value in tcf_ct_flow_table_get+0x611/0x2260 net/sched/act_ct.c:329
rht_ptr_rcu include/linux/rhashtable.h:376 [inline]
__rhashtable_lookup include/linux/rhashtable.h:607 [inline]
rhashtable_lookup include/linux/rhashtable.h:646 [inline]
rhashtable_lookup_fast include/linux/rhashtable.h:672 [inline]
tcf_ct_flow_table_get+0x611/0x2260 net/sched/act_ct.c:329
tcf_ct_init+0xa67/0x2890 net/sched/act_ct.c:1408
tcf_action_init_1+0x6cc/0xb30 net/sched/act_api.c:1425
tcf_action_init+0x458/0xf00 net/sched/act_api.c:1488
tcf_action_add net/sched/act_api.c:2061 [inline]
tc_ctl_action+0x4be/0x19d0 net/sched/act_api.c:2118
rtnetlink_rcv_msg+0x12fc/0x1410 net/core/rtnetlink.c:6647
netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2550
rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6665
netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
netlink_unicast+0xf52/0x1260 net/netlink/af_netlink.c:1357
netlink_sendmsg+0x10da/0x11e0 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:745
____sys_sendmsg+0x877/0xb60 net/socket.c:2597
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2651
__sys_sendmsg net/socket.c:2680 [inline]
__do_sys_sendmsg net/socket.c:2689 [inline]
__se_sys_sendmsg net/socket.c:2687 [inline]
__x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2687
x64_sys_call+0x2dd6/0x3c10 arch/x86/include/generated/asm/syscalls_64.h:47
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Local variable key created at:
tcf_ct_flow_table_get+0x4a/0x2260 net/sched/act_ct.c:324
tcf_ct_init+0xa67/0x2890 net/sched/act_ct.c:1408 |
In the Linux kernel, the following vulnerability has been resolved:
x86/bhi: Avoid warning in #DB handler due to BHI mitigation
When BHI mitigation is enabled, if SYSENTER is invoked with the TF flag set
then entry_SYSENTER_compat() uses CLEAR_BRANCH_HISTORY and calls the
clear_bhb_loop() before the TF flag is cleared. This causes the #DB handler
(exc_debug_kernel()) to issue a warning because single-step is used outside the
entry_SYSENTER_compat() function.
To address this issue, entry_SYSENTER_compat() should use CLEAR_BRANCH_HISTORY
after making sure the TF flag is cleared.
The problem can be reproduced with the following sequence:
$ cat sysenter_step.c
int main()
{ asm("pushf; pop %ax; bts $8,%ax; push %ax; popf; sysenter"); }
$ gcc -o sysenter_step sysenter_step.c
$ ./sysenter_step
Segmentation fault (core dumped)
The program is expected to crash, and the #DB handler will issue a warning.
Kernel log:
WARNING: CPU: 27 PID: 7000 at arch/x86/kernel/traps.c:1009 exc_debug_kernel+0xd2/0x160
...
RIP: 0010:exc_debug_kernel+0xd2/0x160
...
Call Trace:
<#DB>
? show_regs+0x68/0x80
? __warn+0x8c/0x140
? exc_debug_kernel+0xd2/0x160
? report_bug+0x175/0x1a0
? handle_bug+0x44/0x90
? exc_invalid_op+0x1c/0x70
? asm_exc_invalid_op+0x1f/0x30
? exc_debug_kernel+0xd2/0x160
exc_debug+0x43/0x50
asm_exc_debug+0x1e/0x40
RIP: 0010:clear_bhb_loop+0x0/0xb0
...
</#DB>
<TASK>
? entry_SYSENTER_compat_after_hwframe+0x6e/0x8d
</TASK>
[ bp: Massage commit message. ] |
In the Linux kernel, the following vulnerability has been resolved:
Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"
Patch series "mm: Avoid possible overflows in dirty throttling".
Dirty throttling logic assumes dirty limits in page units fit into
32-bits. This patch series makes sure this is true (see patch 2/2 for
more details).
This patch (of 2):
This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78.
The commit is broken in several ways. Firstly, the removed (u64) cast
from the multiplication will introduce a multiplication overflow on 32-bit
archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the
default settings with 4GB of RAM will trigger this). Secondly, the
div64_u64() is unnecessarily expensive on 32-bit archs. We have
div64_ul() in case we want to be safe & cheap. Thirdly, if dirty
thresholds are larger than 1<<32 pages, then dirty balancing is going to
blow up in many other spectacular ways anyway so trying to fix one
possible overflow is just moot. |
In the Linux kernel, the following vulnerability has been resolved:
usb: atm: cxacru: fix endpoint checking in cxacru_bind()
Syzbot is still reporting quite an old issue [1] that occurs due to
incomplete checking of present usb endpoints. As such, wrong
endpoints types may be used at urb sumbitting stage which in turn
triggers a warning in usb_submit_urb().
Fix the issue by verifying that required endpoint types are present
for both in and out endpoints, taking into account cmd endpoint type.
Unfortunately, this patch has not been tested on real hardware.
[1] Syzbot report:
usb 1-1: BOGUS urb xfer, pipe 1 != type 3
WARNING: CPU: 0 PID: 8667 at drivers/usb/core/urb.c:502 usb_submit_urb+0xed2/0x18a0 drivers/usb/core/urb.c:502
Modules linked in:
CPU: 0 PID: 8667 Comm: kworker/0:4 Not tainted 5.14.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: usb_hub_wq hub_event
RIP: 0010:usb_submit_urb+0xed2/0x18a0 drivers/usb/core/urb.c:502
...
Call Trace:
cxacru_cm+0x3c0/0x8e0 drivers/usb/atm/cxacru.c:649
cxacru_card_status+0x22/0xd0 drivers/usb/atm/cxacru.c:760
cxacru_bind+0x7ac/0x11a0 drivers/usb/atm/cxacru.c:1209
usbatm_usb_probe+0x321/0x1ae0 drivers/usb/atm/usbatm.c:1055
cxacru_usb_probe+0xdf/0x1e0 drivers/usb/atm/cxacru.c:1363
usb_probe_interface+0x315/0x7f0 drivers/usb/core/driver.c:396
call_driver_probe drivers/base/dd.c:517 [inline]
really_probe+0x23c/0xcd0 drivers/base/dd.c:595
__driver_probe_device+0x338/0x4d0 drivers/base/dd.c:747
driver_probe_device+0x4c/0x1a0 drivers/base/dd.c:777
__device_attach_driver+0x20b/0x2f0 drivers/base/dd.c:894
bus_for_each_drv+0x15f/0x1e0 drivers/base/bus.c:427
__device_attach+0x228/0x4a0 drivers/base/dd.c:965
bus_probe_device+0x1e4/0x290 drivers/base/bus.c:487
device_add+0xc2f/0x2180 drivers/base/core.c:3354
usb_set_configuration+0x113a/0x1910 drivers/usb/core/message.c:2170
usb_generic_driver_probe+0xba/0x100 drivers/usb/core/generic.c:238
usb_probe_device+0xd9/0x2c0 drivers/usb/core/driver.c:293 |
In the Linux kernel, the following vulnerability has been resolved:
mm: prevent derefencing NULL ptr in pfn_section_valid()
Commit 5ec8e8ea8b77 ("mm/sparsemem: fix race in accessing
memory_section->usage") changed pfn_section_valid() to add a READ_ONCE()
call around "ms->usage" to fix a race with section_deactivate() where
ms->usage can be cleared. The READ_ONCE() call, by itself, is not enough
to prevent NULL pointer dereference. We need to check its value before
dereferencing it. |
In the Linux kernel, the following vulnerability has been resolved:
filelock: fix potential use-after-free in posix_lock_inode
Light Hsieh reported a KASAN UAF warning in trace_posix_lock_inode().
The request pointer had been changed earlier to point to a lock entry
that was added to the inode's list. However, before the tracepoint could
fire, another task raced in and freed that lock.
Fix this by moving the tracepoint inside the spinlock, which should
ensure that this doesn't happen. |
In the Linux kernel, the following vulnerability has been resolved:
net/sched: Fix UAF when resolving a clash
KASAN reports the following UAF:
BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct]
Read of size 1 at addr ffff888c07603600 by task handler130/6469
Call Trace:
<IRQ>
dump_stack_lvl+0x48/0x70
print_address_description.constprop.0+0x33/0x3d0
print_report+0xc0/0x2b0
kasan_report+0xd0/0x120
__asan_load1+0x6c/0x80
tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct]
tcf_ct_act+0x886/0x1350 [act_ct]
tcf_action_exec+0xf8/0x1f0
fl_classify+0x355/0x360 [cls_flower]
__tcf_classify+0x1fd/0x330
tcf_classify+0x21c/0x3c0
sch_handle_ingress.constprop.0+0x2c5/0x500
__netif_receive_skb_core.constprop.0+0xb25/0x1510
__netif_receive_skb_list_core+0x220/0x4c0
netif_receive_skb_list_internal+0x446/0x620
napi_complete_done+0x157/0x3d0
gro_cell_poll+0xcf/0x100
__napi_poll+0x65/0x310
net_rx_action+0x30c/0x5c0
__do_softirq+0x14f/0x491
__irq_exit_rcu+0x82/0xc0
irq_exit_rcu+0xe/0x20
common_interrupt+0xa1/0xb0
</IRQ>
<TASK>
asm_common_interrupt+0x27/0x40
Allocated by task 6469:
kasan_save_stack+0x38/0x70
kasan_set_track+0x25/0x40
kasan_save_alloc_info+0x1e/0x40
__kasan_krealloc+0x133/0x190
krealloc+0xaa/0x130
nf_ct_ext_add+0xed/0x230 [nf_conntrack]
tcf_ct_act+0x1095/0x1350 [act_ct]
tcf_action_exec+0xf8/0x1f0
fl_classify+0x355/0x360 [cls_flower]
__tcf_classify+0x1fd/0x330
tcf_classify+0x21c/0x3c0
sch_handle_ingress.constprop.0+0x2c5/0x500
__netif_receive_skb_core.constprop.0+0xb25/0x1510
__netif_receive_skb_list_core+0x220/0x4c0
netif_receive_skb_list_internal+0x446/0x620
napi_complete_done+0x157/0x3d0
gro_cell_poll+0xcf/0x100
__napi_poll+0x65/0x310
net_rx_action+0x30c/0x5c0
__do_softirq+0x14f/0x491
Freed by task 6469:
kasan_save_stack+0x38/0x70
kasan_set_track+0x25/0x40
kasan_save_free_info+0x2b/0x60
____kasan_slab_free+0x180/0x1f0
__kasan_slab_free+0x12/0x30
slab_free_freelist_hook+0xd2/0x1a0
__kmem_cache_free+0x1a2/0x2f0
kfree+0x78/0x120
nf_conntrack_free+0x74/0x130 [nf_conntrack]
nf_ct_destroy+0xb2/0x140 [nf_conntrack]
__nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack]
nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack]
__nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack]
tcf_ct_act+0x12ad/0x1350 [act_ct]
tcf_action_exec+0xf8/0x1f0
fl_classify+0x355/0x360 [cls_flower]
__tcf_classify+0x1fd/0x330
tcf_classify+0x21c/0x3c0
sch_handle_ingress.constprop.0+0x2c5/0x500
__netif_receive_skb_core.constprop.0+0xb25/0x1510
__netif_receive_skb_list_core+0x220/0x4c0
netif_receive_skb_list_internal+0x446/0x620
napi_complete_done+0x157/0x3d0
gro_cell_poll+0xcf/0x100
__napi_poll+0x65/0x310
net_rx_action+0x30c/0x5c0
__do_softirq+0x14f/0x491
The ct may be dropped if a clash has been resolved but is still passed to
the tcf_ct_flow_table_process_conn function for further usage. This issue
can be fixed by retrieving ct from skb again after confirming conntrack. |
In the Linux kernel, the following vulnerability has been resolved:
sched/deadline: Fix task_struct reference leak
During the execution of the following stress test with linux-rt:
stress-ng --cyclic 30 --timeout 30 --minimize --quiet
kmemleak frequently reported a memory leak concerning the task_struct:
unreferenced object 0xffff8881305b8000 (size 16136):
comm "stress-ng", pid 614, jiffies 4294883961 (age 286.412s)
object hex dump (first 32 bytes):
02 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 .@..............
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
debug hex dump (first 16 bytes):
53 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 S...............
backtrace:
[<00000000046b6790>] dup_task_struct+0x30/0x540
[<00000000c5ca0f0b>] copy_process+0x3d9/0x50e0
[<00000000ced59777>] kernel_clone+0xb0/0x770
[<00000000a50befdc>] __do_sys_clone+0xb6/0xf0
[<000000001dbf2008>] do_syscall_64+0x5d/0xf0
[<00000000552900ff>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
The issue occurs in start_dl_timer(), which increments the task_struct
reference count and sets a timer. The timer callback, dl_task_timer,
is supposed to decrement the reference count upon expiration. However,
if enqueue_task_dl() is called before the timer expires and cancels it,
the reference count is not decremented, leading to the leak.
This patch fixes the reference leak by ensuring the task_struct
reference count is properly decremented when the timer is canceled. |
In the Linux kernel, the following vulnerability has been resolved:
mm/huge_memory: don't unpoison huge_zero_folio
When I did memory failure tests recently, below panic occurs:
kernel BUG at include/linux/mm.h:1135!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14
RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
Call Trace:
<TASK>
do_shrink_slab+0x14f/0x6a0
shrink_slab+0xca/0x8c0
shrink_node+0x2d0/0x7d0
balance_pgdat+0x33a/0x720
kswapd+0x1f3/0x410
kthread+0xd5/0x100
ret_from_fork+0x2f/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
Modules linked in: mce_inject hwpoison_inject
---[ end trace 0000000000000000 ]---
RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
The root cause is that HWPoison flag will be set for huge_zero_folio
without increasing the folio refcnt. But then unpoison_memory() will
decrease the folio refcnt unexpectedly as it appears like a successfully
hwpoisoned folio leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0) when
releasing huge_zero_folio.
Skip unpoisoning huge_zero_folio in unpoison_memory() to fix this issue.
We're not prepared to unpoison huge_zero_folio yet. |
In the Linux kernel, the following vulnerability has been resolved:
md/raid5: fix deadlock that raid5d() wait for itself to clear MD_SB_CHANGE_PENDING
Xiao reported that lvm2 test lvconvert-raid-takeover.sh can hang with
small possibility, the root cause is exactly the same as commit
bed9e27baf52 ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"")
However, Dan reported another hang after that, and junxiao investigated
the problem and found out that this is caused by plugged bio can't issue
from raid5d().
Current implementation in raid5d() has a weird dependence:
1) md_check_recovery() from raid5d() must hold 'reconfig_mutex' to clear
MD_SB_CHANGE_PENDING;
2) raid5d() handles IO in a deadloop, until all IO are issued;
3) IO from raid5d() must wait for MD_SB_CHANGE_PENDING to be cleared;
This behaviour is introduce before v2.6, and for consequence, if other
context hold 'reconfig_mutex', and md_check_recovery() can't update
super_block, then raid5d() will waste one cpu 100% by the deadloop, until
'reconfig_mutex' is released.
Refer to the implementation from raid1 and raid10, fix this problem by
skipping issue IO if MD_SB_CHANGE_PENDING is still set after
md_check_recovery(), daemon thread will be woken up when 'reconfig_mutex'
is released. Meanwhile, the hang problem will be fixed as well. |
In the Linux kernel, the following vulnerability has been resolved:
ext4: fix mb_cache_entry's e_refcnt leak in ext4_xattr_block_cache_find()
Syzbot reports a warning as follows:
============================================
WARNING: CPU: 0 PID: 5075 at fs/mbcache.c:419 mb_cache_destroy+0x224/0x290
Modules linked in:
CPU: 0 PID: 5075 Comm: syz-executor199 Not tainted 6.9.0-rc6-gb947cc5bf6d7
RIP: 0010:mb_cache_destroy+0x224/0x290 fs/mbcache.c:419
Call Trace:
<TASK>
ext4_put_super+0x6d4/0xcd0 fs/ext4/super.c:1375
generic_shutdown_super+0x136/0x2d0 fs/super.c:641
kill_block_super+0x44/0x90 fs/super.c:1675
ext4_kill_sb+0x68/0xa0 fs/ext4/super.c:7327
[...]
============================================
This is because when finding an entry in ext4_xattr_block_cache_find(), if
ext4_sb_bread() returns -ENOMEM, the ce's e_refcnt, which has already grown
in the __entry_find(), won't be put away, and eventually trigger the above
issue in mb_cache_destroy() due to reference count leakage.
So call mb_cache_entry_put() on the -ENOMEM error branch as a quick fix. |
In the Linux kernel, the following vulnerability has been resolved:
stm class: Fix a double free in stm_register_device()
The put_device(&stm->dev) call will trigger stm_device_release() which
frees "stm" so the vfree(stm) on the next line is a double free. |
In the Linux kernel, the following vulnerability has been resolved:
md: fix resync softlockup when bitmap size is less than array size
Is is reported that for dm-raid10, lvextend + lvchange --syncaction will
trigger following softlockup:
kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [mdX_resync:6976]
CPU: 7 PID: 3588 Comm: mdX_resync Kdump: loaded Not tainted 6.9.0-rc4-next-20240419 #1
RIP: 0010:_raw_spin_unlock_irq+0x13/0x30
Call Trace:
<TASK>
md_bitmap_start_sync+0x6b/0xf0
raid10_sync_request+0x25c/0x1b40 [raid10]
md_do_sync+0x64b/0x1020
md_thread+0xa7/0x170
kthread+0xcf/0x100
ret_from_fork+0x30/0x50
ret_from_fork_asm+0x1a/0x30
And the detailed process is as follows:
md_do_sync
j = mddev->resync_min
while (j < max_sectors)
sectors = raid10_sync_request(mddev, j, &skipped)
if (!md_bitmap_start_sync(..., &sync_blocks))
// md_bitmap_start_sync set sync_blocks to 0
return sync_blocks + sectors_skippe;
// sectors = 0;
j += sectors;
// j never change
Root cause is that commit 301867b1c168 ("md/raid10: check
slab-out-of-bounds in md_bitmap_get_counter") return early from
md_bitmap_get_counter(), without setting returned blocks.
Fix this problem by always set returned blocks from
md_bitmap_get_counter"(), as it used to be.
Noted that this patch just fix the softlockup problem in kernel, the
case that bitmap size doesn't match array size still need to be fixed. |
In the Linux kernel, the following vulnerability has been resolved:
net/mlx5: Add a timeout to acquire the command queue semaphore
Prevent forced completion handling on an entry that has not yet been
assigned an index, causing an out of bounds access on idx = -22.
Instead of waiting indefinitely for the sem, blocking flow now waits for
index to be allocated or a sem acquisition timeout before beginning the
timer for FW completion.
Kernel log example:
mlx5_core 0000:06:00.0: wait_func_handle_exec_timeout:1128:(pid 185911): cmd[-22]: CREATE_UCTX(0xa04) No done completion |
In the Linux kernel, the following vulnerability has been resolved:
net/mlx5: Discard command completions in internal error
Fix use after free when FW completion arrives while device is in
internal error state. Avoid calling completion handler in this case,
since the device will flush the command interface and trigger all
completions manually.
Kernel log:
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
...
RIP: 0010:refcount_warn_saturate+0xd8/0xe0
...
Call Trace:
<IRQ>
? __warn+0x79/0x120
? refcount_warn_saturate+0xd8/0xe0
? report_bug+0x17c/0x190
? handle_bug+0x3c/0x60
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? refcount_warn_saturate+0xd8/0xe0
cmd_ent_put+0x13b/0x160 [mlx5_core]
mlx5_cmd_comp_handler+0x5f9/0x670 [mlx5_core]
cmd_comp_notifier+0x1f/0x30 [mlx5_core]
notifier_call_chain+0x35/0xb0
atomic_notifier_call_chain+0x16/0x20
mlx5_eq_async_int+0xf6/0x290 [mlx5_core]
notifier_call_chain+0x35/0xb0
atomic_notifier_call_chain+0x16/0x20
irq_int_handler+0x19/0x30 [mlx5_core]
__handle_irq_event_percpu+0x4b/0x160
handle_irq_event+0x2e/0x80
handle_edge_irq+0x98/0x230
__common_interrupt+0x3b/0xa0
common_interrupt+0x7b/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40 |
In the Linux kernel, the following vulnerability has been resolved:
octeontx2-af: avoid off-by-one read from userspace
We try to access count + 1 byte from userspace with memdup_user(buffer,
count + 1). However, the userspace only provides buffer of count bytes and
only these count bytes are verified to be okay to access. To ensure the
copied buffer is NUL terminated, we use memdup_user_nul instead. |