CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
In the Linux kernel, the following vulnerability has been resolved:
dma-direct: Leak pages on dma_set_decrypted() failure
On TDX it is possible for the untrusted host to cause
set_memory_encrypted() or set_memory_decrypted() to fail such that an
error is returned and the resulting memory is shared. Callers need to
take care to handle these errors to avoid returning decrypted (shared)
memory to the page allocator, which could lead to functional or security
issues.
DMA could free decrypted/shared pages if dma_set_decrypted() fails. This
should be a rare case. Just leak the pages in this case instead of
freeing them. |
In the Linux kernel, the following vulnerability has been resolved:
wifi: ath11k: decrease MHI channel buffer length to 8KB
Currently buf_len field of ath11k_mhi_config_qca6390 is assigned
with 0, making MHI use a default size, 64KB, to allocate channel
buffers. This is likely to fail in some scenarios where system
memory is highly fragmented and memory compaction or reclaim is
not allowed.
There is a fail report which is caused by it:
kworker/u32:45: page allocation failure: order:4, mode:0x40c00(GFP_NOIO|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
CPU: 0 PID: 19318 Comm: kworker/u32:45 Not tainted 6.8.0-rc3-1.gae4495f-default #1 openSUSE Tumbleweed (unreleased) 493b6d5b382c603654d7a81fc3c144d59a1dfceb
Workqueue: events_unbound async_run_entry_fn
Call Trace:
<TASK>
dump_stack_lvl+0x47/0x60
warn_alloc+0x13a/0x1b0
? srso_alias_return_thunk+0x5/0xfbef5
? __alloc_pages_direct_compact+0xab/0x210
__alloc_pages_slowpath.constprop.0+0xd3e/0xda0
__alloc_pages+0x32d/0x350
? mhi_prepare_channel+0x127/0x2d0 [mhi 40df44e07c05479f7a6e7b90fba9f0e0031a7814]
__kmalloc_large_node+0x72/0x110
__kmalloc+0x37c/0x480
? mhi_map_single_no_bb+0x77/0xf0 [mhi 40df44e07c05479f7a6e7b90fba9f0e0031a7814]
? mhi_prepare_channel+0x127/0x2d0 [mhi 40df44e07c05479f7a6e7b90fba9f0e0031a7814]
mhi_prepare_channel+0x127/0x2d0 [mhi 40df44e07c05479f7a6e7b90fba9f0e0031a7814]
__mhi_prepare_for_transfer+0x44/0x80 [mhi 40df44e07c05479f7a6e7b90fba9f0e0031a7814]
? __pfx_____mhi_prepare_for_transfer+0x10/0x10 [mhi 40df44e07c05479f7a6e7b90fba9f0e0031a7814]
device_for_each_child+0x5c/0xa0
? __pfx_pci_pm_resume+0x10/0x10
ath11k_core_resume+0x65/0x100 [ath11k a5094e22d7223135c40d93c8f5321cf09fd85e4e]
? srso_alias_return_thunk+0x5/0xfbef5
ath11k_pci_pm_resume+0x32/0x60 [ath11k_pci 830b7bfc3ea80ebef32e563cafe2cb55e9cc73ec]
? srso_alias_return_thunk+0x5/0xfbef5
dpm_run_callback+0x8c/0x1e0
device_resume+0x104/0x340
? __pfx_dpm_watchdog_handler+0x10/0x10
async_resume+0x1d/0x30
async_run_entry_fn+0x32/0x120
process_one_work+0x168/0x330
worker_thread+0x2f5/0x410
? __pfx_worker_thread+0x10/0x10
kthread+0xe8/0x120
? __pfx_kthread+0x10/0x10
ret_from_fork+0x34/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
Actually those buffers are used only by QMI target -> host communication.
And for WCN6855 and QCA6390, the largest packet size for that is less
than 6KB. So change buf_len field to 8KB, which results in order 1
allocation if page size is 4KB. In this way, we can at least save some
memory, and as well as decrease the possibility of allocation failure
in those scenarios.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30 |
In the Linux kernel, the following vulnerability has been resolved:
block: prevent division by zero in blk_rq_stat_sum()
The expression dst->nr_samples + src->nr_samples may
have zero value on overflow. It is necessary to add
a check to avoid division by zero.
Found by Linux Verification Center (linuxtesting.org) with Svace. |
In the Linux kernel, the following vulnerability has been resolved:
tcp: properly terminate timers for kernel sockets
We had various syzbot reports about tcp timers firing after
the corresponding netns has been dismantled.
Fortunately Josef Bacik could trigger the issue more often,
and could test a patch I wrote two years ago.
When TCP sockets are closed, we call inet_csk_clear_xmit_timers()
to 'stop' the timers.
inet_csk_clear_xmit_timers() can be called from any context,
including when socket lock is held.
This is the reason it uses sk_stop_timer(), aka del_timer().
This means that ongoing timers might finish much later.
For user sockets, this is fine because each running timer
holds a reference on the socket, and the user socket holds
a reference on the netns.
For kernel sockets, we risk that the netns is freed before
timer can complete, because kernel sockets do not hold
reference on the netns.
This patch adds inet_csk_clear_xmit_timers_sync() function
that using sk_stop_timer_sync() to make sure all timers
are terminated before the kernel socket is released.
Modules using kernel sockets close them in their netns exit()
handler.
Also add sock_not_owned_by_me() helper to get LOCKDEP
support : inet_csk_clear_xmit_timers_sync() must not be called
while socket lock is held.
It is very possible we can revert in the future commit
3a58f13a881e ("net: rds: acquire refcount on TCP sockets")
which attempted to solve the issue in rds only.
(net/smc/af_smc.c and net/mptcp/subflow.c have similar code)
We probably can remove the check_net() tests from
tcp_out_of_resources() and __tcp_close() in the future. |
In the Linux kernel, the following vulnerability has been resolved:
mlxbf_gige: call request_irq() after NAPI initialized
The mlxbf_gige driver encounters a NULL pointer exception in
mlxbf_gige_open() when kdump is enabled. The sequence to reproduce
the exception is as follows:
a) enable kdump
b) trigger kdump via "echo c > /proc/sysrq-trigger"
c) kdump kernel executes
d) kdump kernel loads mlxbf_gige module
e) the mlxbf_gige module runs its open() as the
the "oob_net0" interface is brought up
f) mlxbf_gige module will experience an exception
during its open(), something like:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Mem abort info:
ESR = 0x0000000086000004
EC = 0x21: IABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000
[0000000000000000] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 0000000086000004 [#1] SMP
CPU: 0 PID: 812 Comm: NetworkManager Tainted: G OE 5.15.0-1035-bluefield #37-Ubuntu
Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024
pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : 0x0
lr : __napi_poll+0x40/0x230
sp : ffff800008003e00
x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff
x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8
x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000
x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000
x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0
x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c
x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398
x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2
x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238
Call trace:
0x0
net_rx_action+0x178/0x360
__do_softirq+0x15c/0x428
__irq_exit_rcu+0xac/0xec
irq_exit+0x18/0x2c
handle_domain_irq+0x6c/0xa0
gic_handle_irq+0xec/0x1b0
call_on_irq_stack+0x20/0x2c
do_interrupt_handler+0x5c/0x70
el1_interrupt+0x30/0x50
el1h_64_irq_handler+0x18/0x2c
el1h_64_irq+0x7c/0x80
__setup_irq+0x4c0/0x950
request_threaded_irq+0xf4/0x1bc
mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige]
mlxbf_gige_open+0x5c/0x170 [mlxbf_gige]
__dev_open+0x100/0x220
__dev_change_flags+0x16c/0x1f0
dev_change_flags+0x2c/0x70
do_setlink+0x220/0xa40
__rtnl_newlink+0x56c/0x8a0
rtnl_newlink+0x58/0x84
rtnetlink_rcv_msg+0x138/0x3c4
netlink_rcv_skb+0x64/0x130
rtnetlink_rcv+0x20/0x30
netlink_unicast+0x2ec/0x360
netlink_sendmsg+0x278/0x490
__sock_sendmsg+0x5c/0x6c
____sys_sendmsg+0x290/0x2d4
___sys_sendmsg+0x84/0xd0
__sys_sendmsg+0x70/0xd0
__arm64_sys_sendmsg+0x2c/0x40
invoke_syscall+0x78/0x100
el0_svc_common.constprop.0+0x54/0x184
do_el0_svc+0x30/0xac
el0_svc+0x48/0x160
el0t_64_sync_handler+0xa4/0x12c
el0t_64_sync+0x1a4/0x1a8
Code: bad PC value
---[ end trace 7d1c3f3bf9d81885 ]---
Kernel panic - not syncing: Oops: Fatal exception in interrupt
Kernel Offset: 0x2870a7a00000 from 0xffff800008000000
PHYS_OFFSET: 0x80000000
CPU features: 0x0,000005c1,a3332a5a
Memory Limit: none
---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
The exception happens because there is a pending RX interrupt before the
call to request_irq(RX IRQ) executes. Then, the RX IRQ handler fires
immediately after this request_irq() completes. The
---truncated--- |
In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_tables: reject new basechain after table flag update
When dormant flag is toggled, hooks are disabled in the commit phase by
iterating over current chains in table (existing and new).
The following configuration allows for an inconsistent state:
add table x
add chain x y { type filter hook input priority 0; }
add table x { flags dormant; }
add chain x w { type filter hook input priority 1; }
which triggers the following warning when trying to unregister chain w
which is already unregistered.
[ 127.322252] WARNING: CPU: 7 PID: 1211 at net/netfilter/core.c:50 1 __nf_unregister_net_hook+0x21a/0x260
[...]
[ 127.322519] Call Trace:
[ 127.322521] <TASK>
[ 127.322524] ? __warn+0x9f/0x1a0
[ 127.322531] ? __nf_unregister_net_hook+0x21a/0x260
[ 127.322537] ? report_bug+0x1b1/0x1e0
[ 127.322545] ? handle_bug+0x3c/0x70
[ 127.322552] ? exc_invalid_op+0x17/0x40
[ 127.322556] ? asm_exc_invalid_op+0x1a/0x20
[ 127.322563] ? kasan_save_free_info+0x3b/0x60
[ 127.322570] ? __nf_unregister_net_hook+0x6a/0x260
[ 127.322577] ? __nf_unregister_net_hook+0x21a/0x260
[ 127.322583] ? __nf_unregister_net_hook+0x6a/0x260
[ 127.322590] ? __nf_tables_unregister_hook+0x8a/0xe0 [nf_tables]
[ 127.322655] nft_table_disable+0x75/0xf0 [nf_tables]
[ 127.322717] nf_tables_commit+0x2571/0x2620 [nf_tables] |
In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_tables: flush pending destroy work before exit_net release
Similar to 2c9f0293280e ("netfilter: nf_tables: flush pending destroy
work before netlink notifier") to address a race between exit_net and
the destroy workqueue.
The trace below shows an element to be released via destroy workqueue
while exit_net path (triggered via module removal) has already released
the set that is used in such transaction.
[ 1360.547789] BUG: KASAN: slab-use-after-free in nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
[ 1360.547861] Read of size 8 at addr ffff888140500cc0 by task kworker/4:1/152465
[ 1360.547870] CPU: 4 PID: 152465 Comm: kworker/4:1 Not tainted 6.8.0+ #359
[ 1360.547882] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
[ 1360.547984] Call Trace:
[ 1360.547991] <TASK>
[ 1360.547998] dump_stack_lvl+0x53/0x70
[ 1360.548014] print_report+0xc4/0x610
[ 1360.548026] ? __virt_addr_valid+0xba/0x160
[ 1360.548040] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 1360.548054] ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
[ 1360.548176] kasan_report+0xae/0xe0
[ 1360.548189] ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
[ 1360.548312] nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
[ 1360.548447] ? __pfx_nf_tables_trans_destroy_work+0x10/0x10 [nf_tables]
[ 1360.548577] ? _raw_spin_unlock_irq+0x18/0x30
[ 1360.548591] process_one_work+0x2f1/0x670
[ 1360.548610] worker_thread+0x4d3/0x760
[ 1360.548627] ? __pfx_worker_thread+0x10/0x10
[ 1360.548640] kthread+0x16b/0x1b0
[ 1360.548653] ? __pfx_kthread+0x10/0x10
[ 1360.548665] ret_from_fork+0x2f/0x50
[ 1360.548679] ? __pfx_kthread+0x10/0x10
[ 1360.548690] ret_from_fork_asm+0x1a/0x30
[ 1360.548707] </TASK>
[ 1360.548719] Allocated by task 192061:
[ 1360.548726] kasan_save_stack+0x20/0x40
[ 1360.548739] kasan_save_track+0x14/0x30
[ 1360.548750] __kasan_kmalloc+0x8f/0xa0
[ 1360.548760] __kmalloc_node+0x1f1/0x450
[ 1360.548771] nf_tables_newset+0x10c7/0x1b50 [nf_tables]
[ 1360.548883] nfnetlink_rcv_batch+0xbc4/0xdc0 [nfnetlink]
[ 1360.548909] nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink]
[ 1360.548927] netlink_unicast+0x367/0x4f0
[ 1360.548935] netlink_sendmsg+0x34b/0x610
[ 1360.548944] ____sys_sendmsg+0x4d4/0x510
[ 1360.548953] ___sys_sendmsg+0xc9/0x120
[ 1360.548961] __sys_sendmsg+0xbe/0x140
[ 1360.548971] do_syscall_64+0x55/0x120
[ 1360.548982] entry_SYSCALL_64_after_hwframe+0x55/0x5d
[ 1360.548994] Freed by task 192222:
[ 1360.548999] kasan_save_stack+0x20/0x40
[ 1360.549009] kasan_save_track+0x14/0x30
[ 1360.549019] kasan_save_free_info+0x3b/0x60
[ 1360.549028] poison_slab_object+0x100/0x180
[ 1360.549036] __kasan_slab_free+0x14/0x30
[ 1360.549042] kfree+0xb6/0x260
[ 1360.549049] __nft_release_table+0x473/0x6a0 [nf_tables]
[ 1360.549131] nf_tables_exit_net+0x170/0x240 [nf_tables]
[ 1360.549221] ops_exit_list+0x50/0xa0
[ 1360.549229] free_exit_list+0x101/0x140
[ 1360.549236] unregister_pernet_operations+0x107/0x160
[ 1360.549245] unregister_pernet_subsys+0x1c/0x30
[ 1360.549254] nf_tables_module_exit+0x43/0x80 [nf_tables]
[ 1360.549345] __do_sys_delete_module+0x253/0x370
[ 1360.549352] do_syscall_64+0x55/0x120
[ 1360.549360] entry_SYSCALL_64_after_hwframe+0x55/0x5d
(gdb) list *__nft_release_table+0x473
0x1e033 is in __nft_release_table (net/netfilter/nf_tables_api.c:11354).
11349 list_for_each_entry_safe(flowtable, nf, &table->flowtables, list) {
11350 list_del(&flowtable->list);
11351 nft_use_dec(&table->use);
11352 nf_tables_flowtable_destroy(flowtable);
11353 }
11354 list_for_each_entry_safe(set, ns, &table->sets, list) {
11355 list_del(&set->list);
11356 nft_use_dec(&table->use);
11357 if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT))
11358 nft_map_deactivat
---truncated--- |
In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
nft_unregister_flowtable_type() within nf_flow_inet_module_exit() can
concurrent with __nft_flowtable_type_get() within nf_tables_newflowtable().
And thhere is not any protection when iterate over nf_tables_flowtables
list in __nft_flowtable_type_get(). Therefore, there is pertential
data-race of nf_tables_flowtables list entry.
Use list_for_each_entry_rcu() to iterate over nf_tables_flowtables list
in __nft_flowtable_type_get(), and use rcu_read_lock() in the caller
nft_flowtable_type_get() to protect the entire type query process. |
In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_tables: discard table flag update with pending basechain deletion
Hook unregistration is deferred to the commit phase, same occurs with
hook updates triggered by the table dormant flag. When both commands are
combined, this results in deleting a basechain while leaving its hook
still registered in the core. |
In the Linux kernel, the following vulnerability has been resolved:
netfilter: validate user input for expected length
I got multiple syzbot reports showing old bugs exposed
by BPF after commit 20f2505fb436 ("bpf: Try to avoid kzalloc
in cgroup/{s,g}etsockopt")
setsockopt() @optlen argument should be taken into account
before copying data.
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
BUG: KASAN: slab-out-of-bounds in do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
BUG: KASAN: slab-out-of-bounds in do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
Read of size 96 at addr ffff88802cd73da0 by task syz-executor.4/7238
CPU: 1 PID: 7238 Comm: syz-executor.4 Not tainted 6.9.0-rc2-next-20240403-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
__asan_memcpy+0x29/0x70 mm/kasan/shadow.c:105
copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
copy_from_sockptr include/linux/sockptr.h:55 [inline]
do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
nf_setsockopt+0x295/0x2c0 net/netfilter/nf_sockopt.c:101
do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
__sys_setsockopt+0x1ae/0x250 net/socket.c:2334
__do_sys_setsockopt net/socket.c:2343 [inline]
__se_sys_setsockopt net/socket.c:2340 [inline]
__x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x72/0x7a
RIP: 0033:0x7fd22067dde9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fd21f9ff0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00007fd2207abf80 RCX: 00007fd22067dde9
RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007fd2206ca47a R08: 0000000000000001 R09: 0000000000000000
R10: 0000000020000880 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007fd2207abf80 R15: 00007ffd2d0170d8
</TASK>
Allocated by task 7238:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
__kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
kasan_kmalloc include/linux/kasan.h:211 [inline]
__do_kmalloc_node mm/slub.c:4069 [inline]
__kmalloc_noprof+0x200/0x410 mm/slub.c:4082
kmalloc_noprof include/linux/slab.h:664 [inline]
__cgroup_bpf_run_filter_setsockopt+0xd47/0x1050 kernel/bpf/cgroup.c:1869
do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
__sys_setsockopt+0x1ae/0x250 net/socket.c:2334
__do_sys_setsockopt net/socket.c:2343 [inline]
__se_sys_setsockopt net/socket.c:2340 [inline]
__x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x72/0x7a
The buggy address belongs to the object at ffff88802cd73da0
which belongs to the cache kmalloc-8 of size 8
The buggy address is located 0 bytes inside of
allocated 1-byte region [ffff88802cd73da0, ffff88802cd73da1)
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802cd73020 pfn:0x2cd73
flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff)
page_type: 0xffffefff(slab)
raw: 00fff80000000000 ffff888015041280 dead000000000100 dead000000000122
raw: ffff88802cd73020 000000008080007f 00000001ffffefff 00
---truncated--- |
In the Linux kernel, the following vulnerability has been resolved:
bpf, sockmap: Prevent lock inversion deadlock in map delete elem
syzkaller started using corpuses where a BPF tracing program deletes
elements from a sockmap/sockhash map. Because BPF tracing programs can be
invoked from any interrupt context, locks taken during a map_delete_elem
operation must be hardirq-safe. Otherwise a deadlock due to lock inversion
is possible, as reported by lockdep:
CPU0 CPU1
---- ----
lock(&htab->buckets[i].lock);
local_irq_disable();
lock(&host->lock);
lock(&htab->buckets[i].lock);
<Interrupt>
lock(&host->lock);
Locks in sockmap are hardirq-unsafe by design. We expects elements to be
deleted from sockmap/sockhash only in task (normal) context with interrupts
enabled, or in softirq context.
Detect when map_delete_elem operation is invoked from a context which is
_not_ hardirq-unsafe, that is interrupts are disabled, and bail out with an
error.
Note that map updates are not affected by this issue. BPF verifier does not
allow updating sockmap/sockhash from a BPF tracing program today. |
In the Linux kernel, the following vulnerability has been resolved:
mlxbf_gige: stop interface during shutdown
The mlxbf_gige driver intermittantly encounters a NULL pointer
exception while the system is shutting down via "reboot" command.
The mlxbf_driver will experience an exception right after executing
its shutdown() method. One example of this exception is:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070
Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=000000011d373000
[0000000000000070] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
CPU: 0 PID: 13 Comm: ksoftirqd/0 Tainted: G S OE 5.15.0-bf.6.gef6992a #1
Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.0.2.12669 Apr 21 2023
pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : mlxbf_gige_handle_tx_complete+0xc8/0x170 [mlxbf_gige]
lr : mlxbf_gige_poll+0x54/0x160 [mlxbf_gige]
sp : ffff8000080d3c10
x29: ffff8000080d3c10 x28: ffffcce72cbb7000 x27: ffff8000080d3d58
x26: ffff0000814e7340 x25: ffff331cd1a05000 x24: ffffcce72c4ea008
x23: ffff0000814e4b40 x22: ffff0000814e4d10 x21: ffff0000814e4128
x20: 0000000000000000 x19: ffff0000814e4a80 x18: ffffffffffffffff
x17: 000000000000001c x16: ffffcce72b4553f4 x15: ffff80008805b8a7
x14: 0000000000000000 x13: 0000000000000030 x12: 0101010101010101
x11: 7f7f7f7f7f7f7f7f x10: c2ac898b17576267 x9 : ffffcce720fa5404
x8 : ffff000080812138 x7 : 0000000000002e9a x6 : 0000000000000080
x5 : ffff00008de3b000 x4 : 0000000000000000 x3 : 0000000000000001
x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
mlxbf_gige_handle_tx_complete+0xc8/0x170 [mlxbf_gige]
mlxbf_gige_poll+0x54/0x160 [mlxbf_gige]
__napi_poll+0x40/0x1c8
net_rx_action+0x314/0x3a0
__do_softirq+0x128/0x334
run_ksoftirqd+0x54/0x6c
smpboot_thread_fn+0x14c/0x190
kthread+0x10c/0x110
ret_from_fork+0x10/0x20
Code: 8b070000 f9000ea0 f95056c0 f86178a1 (b9407002)
---[ end trace 7cc3941aa0d8e6a4 ]---
Kernel panic - not syncing: Oops: Fatal exception in interrupt
Kernel Offset: 0x4ce722520000 from 0xffff800008000000
PHYS_OFFSET: 0x80000000
CPU features: 0x000005c1,a3330e5a
Memory Limit: none
---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
During system shutdown, the mlxbf_gige driver's shutdown() is always executed.
However, the driver's stop() method will only execute if networking interface
configuration logic within the Linux distribution has been setup to do so.
If shutdown() executes but stop() does not execute, NAPI remains enabled
and this can lead to an exception if NAPI is scheduled while the hardware
interface has only been partially deinitialized.
The networking interface managed by the mlxbf_gige driver must be properly
stopped during system shutdown so that IFF_UP is cleared, the hardware
interface is put into a clean state, and NAPI is fully deinitialized. |
In the Linux kernel, the following vulnerability has been resolved:
udp: do not accept non-tunnel GSO skbs landing in a tunnel
When rx-udp-gro-forwarding is enabled UDP packets might be GROed when
being forwarded. If such packets might land in a tunnel this can cause
various issues and udp_gro_receive makes sure this isn't the case by
looking for a matching socket. This is performed in
udp4/6_gro_lookup_skb but only in the current netns. This is an issue
with tunneled packets when the endpoint is in another netns. In such
cases the packets will be GROed at the UDP level, which leads to various
issues later on. The same thing can happen with rx-gro-list.
We saw this with geneve packets being GROed at the UDP level. In such
case gso_size is set; later the packet goes through the geneve rx path,
the geneve header is pulled, the offset are adjusted and frag_list skbs
are not adjusted with regard to geneve. When those skbs hit
skb_fragment, it will misbehave. Different outcomes are possible
depending on what the GROed skbs look like; from corrupted packets to
kernel crashes.
One example is a BUG_ON[1] triggered in skb_segment while processing the
frag_list. Because gso_size is wrong (geneve header was pulled)
skb_segment thinks there is "geneve header size" of data in frag_list,
although it's in fact the next packet. The BUG_ON itself has nothing to
do with the issue. This is only one of the potential issues.
Looking up for a matching socket in udp_gro_receive is fragile: the
lookup could be extended to all netns (not speaking about performances)
but nothing prevents those packets from being modified in between and we
could still not find a matching socket. It's OK to keep the current
logic there as it should cover most cases but we also need to make sure
we handle tunnel packets being GROed too early.
This is done by extending the checks in udp_unexpected_gso: GSO packets
lacking the SKB_GSO_UDP_TUNNEL/_CSUM bits and landing in a tunnel must
be segmented.
[1] kernel BUG at net/core/skbuff.c:4408!
RIP: 0010:skb_segment+0xd2a/0xf70
__udp_gso_segment+0xaa/0x560 |
In the Linux kernel, the following vulnerability has been resolved:
x86/mm/pat: fix VM_PAT handling in COW mappings
PAT handling won't do the right thing in COW mappings: the first PTE (or,
in fact, all PTEs) can be replaced during write faults to point at anon
folios. Reliably recovering the correct PFN and cachemode using
follow_phys() from PTEs will not work in COW mappings.
Using follow_phys(), we might just get the address+protection of the anon
folio (which is very wrong), or fail on swap/nonswap entries, failing
follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and
track_pfn_copy(), not properly calling free_pfn_range().
In free_pfn_range(), we either wouldn't call memtype_free() or would call
it with the wrong range, possibly leaking memory.
To fix that, let's update follow_phys() to refuse returning anon folios,
and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings
if we run into that.
We will now properly handle untrack_pfn() with COW mappings, where we
don't need the cachemode. We'll have to fail fork()->track_pfn_copy() if
the first page was replaced by an anon folio, though: we'd have to store
the cachemode in the VMA to make this work, likely growing the VMA size.
For now, lets keep it simple and let track_pfn_copy() just fail in that
case: it would have failed in the past with swap/nonswap entries already,
and it would have done the wrong thing with anon folios.
Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn():
<--- C reproducer --->
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <liburing.h>
int main(void)
{
struct io_uring_params p = {};
int ring_fd;
size_t size;
char *map;
ring_fd = io_uring_setup(1, &p);
if (ring_fd < 0) {
perror("io_uring_setup");
return 1;
}
size = p.sq_off.array + p.sq_entries * sizeof(unsigned);
/* Map the submission queue ring MAP_PRIVATE */
map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE,
ring_fd, IORING_OFF_SQ_RING);
if (map == MAP_FAILED) {
perror("mmap");
return 1;
}
/* We have at least one page. Let's COW it. */
*map = 0;
pause();
return 0;
}
<--- C reproducer --->
On a system with 16 GiB RAM and swap configured:
# ./iouring &
# memhog 16G
# killall iouring
[ 301.552930] ------------[ cut here ]------------
[ 301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100
[ 301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g
[ 301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1
[ 301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4
[ 301.559569] RIP: 0010:untrack_pfn+0xf4/0x100
[ 301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000
[ 301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282
[ 301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047
[ 301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200
[ 301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000
[ 301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000
[ 301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000
[ 301.564186] FS: 0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000
[ 301.564773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0
[ 301.565725] PKRU: 55555554
[ 301.565944] Call Trace:
[ 301.566148] <TASK>
[ 301.566325] ? untrack_pfn+0xf4/0x100
[ 301.566618] ? __warn+0x81/0x130
[ 301.566876] ? untrack_pfn+0xf4/0x100
[ 3
---truncated--- |
In the Linux kernel, the following vulnerability has been resolved:
x86/coco: Require seeding RNG with RDRAND on CoCo systems
There are few uses of CoCo that don't rely on working cryptography and
hence a working RNG. Unfortunately, the CoCo threat model means that the
VM host cannot be trusted and may actively work against guests to
extract secrets or manipulate computation. Since a malicious host can
modify or observe nearly all inputs to guests, the only remaining source
of entropy for CoCo guests is RDRAND.
If RDRAND is broken -- due to CPU hardware fault -- the RNG as a whole
is meant to gracefully continue on gathering entropy from other sources,
but since there aren't other sources on CoCo, this is catastrophic.
This is mostly a concern at boot time when initially seeding the RNG, as
after that the consequences of a broken RDRAND are much more
theoretical.
So, try at boot to seed the RNG using 256 bits of RDRAND output. If this
fails, panic(). This will also trigger if the system is booted without
RDRAND, as RDRAND is essential for a safe CoCo boot.
Add this deliberately to be "just a CoCo x86 driver feature" and not
part of the RNG itself. Many device drivers and platforms have some
desire to contribute something to the RNG, and add_device_randomness()
is specifically meant for this purpose.
Any driver can call it with seed data of any quality, or even garbage
quality, and it can only possibly make the quality of the RNG better or
have no effect, but can never make it worse.
Rather than trying to build something into the core of the RNG, consider
the particular CoCo issue just a CoCo issue, and therefore separate it
all out into driver (well, arch/platform) code.
[ bp: Massage commit message. ] |
In the Linux kernel, the following vulnerability has been resolved:
smb: client: fix UAF in smb2_reconnect_server()
The UAF bug is due to smb2_reconnect_server() accessing a session that
is already being teared down by another thread that is executing
__cifs_put_smb_ses(). This can happen when (a) the client has
connection to the server but no session or (b) another thread ends up
setting @ses->ses_status again to something different than
SES_EXITING.
To fix this, we need to make sure to unconditionally set
@ses->ses_status to SES_EXITING and prevent any other threads from
setting a new status while we're still tearing it down.
The following can be reproduced by adding some delay to right after
the ipc is freed in __cifs_put_smb_ses() - which will give
smb2_reconnect_server() worker a chance to run and then accessing
@ses->ipc:
kinit ...
mount.cifs //srv/share /mnt/1 -o sec=krb5,nohandlecache,echo_interval=10
[disconnect srv]
ls /mnt/1 &>/dev/null
sleep 30
kdestroy
[reconnect srv]
sleep 10
umount /mnt/1
...
CIFS: VFS: Verify user has a krb5 ticket and keyutils is installed
CIFS: VFS: \\srv Send error in SessSetup = -126
CIFS: VFS: Verify user has a krb5 ticket and keyutils is installed
CIFS: VFS: \\srv Send error in SessSetup = -126
general protection fault, probably for non-canonical address
0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI
CPU: 3 PID: 50 Comm: kworker/3:1 Not tainted 6.9.0-rc2 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39
04/01/2014
Workqueue: cifsiod smb2_reconnect_server [cifs]
RIP: 0010:__list_del_entry_valid_or_report+0x33/0xf0
Code: 4f 08 48 85 d2 74 42 48 85 c9 74 59 48 b8 00 01 00 00 00 00 ad
de 48 39 c2 74 61 48 b8 22 01 00 00 00 00 74 69 <48> 8b 01 48 39 f8 75
7b 48 8b 72 08 48 39 c6 0f 85 88 00 00 00 b8
RSP: 0018:ffffc900001bfd70 EFLAGS: 00010a83
RAX: dead000000000122 RBX: ffff88810da53838 RCX: 6b6b6b6b6b6b6b6b
RDX: 6b6b6b6b6b6b6b6b RSI: ffffffffc02f6878 RDI: ffff88810da53800
RBP: ffff88810da53800 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88810c064000
R13: 0000000000000001 R14: ffff88810c064000 R15: ffff8881039cc000
FS: 0000000000000000(0000) GS:ffff888157c00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe3728b1000 CR3: 000000010caa4000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
? die_addr+0x36/0x90
? exc_general_protection+0x1c1/0x3f0
? asm_exc_general_protection+0x26/0x30
? __list_del_entry_valid_or_report+0x33/0xf0
__cifs_put_smb_ses+0x1ae/0x500 [cifs]
smb2_reconnect_server+0x4ed/0x710 [cifs]
process_one_work+0x205/0x6b0
worker_thread+0x191/0x360
? __pfx_worker_thread+0x10/0x10
kthread+0xe2/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x34/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK> |
In the Linux kernel, the following vulnerability has been resolved:
icmp: prevent possible NULL dereferences from icmp_build_probe()
First problem is a double call to __in_dev_get_rcu(), because
the second one could return NULL.
if (__in_dev_get_rcu(dev) && __in_dev_get_rcu(dev)->ifa_list)
Second problem is a read from dev->ip6_ptr with no NULL check:
if (!list_empty(&rcu_dereference(dev->ip6_ptr)->addr_list))
Use the correct RCU API to fix these.
v2: add missing include <net/addrconf.h> |
In the Linux kernel, the following vulnerability has been resolved:
mlxsw: spectrum_acl_tcam: Fix possible use-after-free during activity update
The rule activity update delayed work periodically traverses the list of
configured rules and queries their activity from the device.
As part of this task it accesses the entry pointed by 'ventry->entry',
but this entry can be changed concurrently by the rehash delayed work,
leading to a use-after-free [1].
Fix by closing the race and perform the activity query under the
'vregion->lock' mutex.
[1]
BUG: KASAN: slab-use-after-free in mlxsw_sp_acl_tcam_flower_rule_activity_get+0x121/0x140
Read of size 8 at addr ffff8881054ed808 by task kworker/0:18/181
CPU: 0 PID: 181 Comm: kworker/0:18 Not tainted 6.9.0-rc2-custom-00781-gd5ab772d32f7 #2
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_rule_activity_update_work
Call Trace:
<TASK>
dump_stack_lvl+0xc6/0x120
print_report+0xce/0x670
kasan_report+0xd7/0x110
mlxsw_sp_acl_tcam_flower_rule_activity_get+0x121/0x140
mlxsw_sp_acl_rule_activity_update_work+0x219/0x400
process_one_work+0x8eb/0x19b0
worker_thread+0x6c9/0xf70
kthread+0x2c9/0x3b0
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30
</TASK>
Allocated by task 1039:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x8f/0xa0
__kmalloc+0x19c/0x360
mlxsw_sp_acl_tcam_entry_create+0x7b/0x1f0
mlxsw_sp_acl_tcam_vchunk_migrate_all+0x30d/0xb50
mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300
process_one_work+0x8eb/0x19b0
worker_thread+0x6c9/0xf70
kthread+0x2c9/0x3b0
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30
Freed by task 1039:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
poison_slab_object+0x102/0x170
__kasan_slab_free+0x14/0x30
kfree+0xc1/0x290
mlxsw_sp_acl_tcam_vchunk_migrate_all+0x3d7/0xb50
mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300
process_one_work+0x8eb/0x19b0
worker_thread+0x6c9/0xf70
kthread+0x2c9/0x3b0
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30 |
In the Linux kernel, the following vulnerability has been resolved:
mlxsw: spectrum_acl_tcam: Fix possible use-after-free during rehash
The rehash delayed work migrates filters from one region to another
according to the number of available credits.
The migrated from region is destroyed at the end of the work if the
number of credits is non-negative as the assumption is that this is
indicative of migration being complete. This assumption is incorrect as
a non-negative number of credits can also be the result of a failed
migration.
The destruction of a region that still has filters referencing it can
result in a use-after-free [1].
Fix by not destroying the region if migration failed.
[1]
BUG: KASAN: slab-use-after-free in mlxsw_sp_acl_ctcam_region_entry_remove+0x21d/0x230
Read of size 8 at addr ffff8881735319e8 by task kworker/0:31/3858
CPU: 0 PID: 3858 Comm: kworker/0:31 Tainted: G W 6.9.0-rc2-custom-00782-gf2275c2157d8 #5
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
Call Trace:
<TASK>
dump_stack_lvl+0xc6/0x120
print_report+0xce/0x670
kasan_report+0xd7/0x110
mlxsw_sp_acl_ctcam_region_entry_remove+0x21d/0x230
mlxsw_sp_acl_ctcam_entry_del+0x2e/0x70
mlxsw_sp_acl_atcam_entry_del+0x81/0x210
mlxsw_sp_acl_tcam_vchunk_migrate_all+0x3cd/0xb50
mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300
process_one_work+0x8eb/0x19b0
worker_thread+0x6c9/0xf70
kthread+0x2c9/0x3b0
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30
</TASK>
Allocated by task 174:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x8f/0xa0
__kmalloc+0x19c/0x360
mlxsw_sp_acl_tcam_region_create+0xdf/0x9c0
mlxsw_sp_acl_tcam_vregion_rehash_work+0x954/0x1300
process_one_work+0x8eb/0x19b0
worker_thread+0x6c9/0xf70
kthread+0x2c9/0x3b0
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30
Freed by task 7:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
poison_slab_object+0x102/0x170
__kasan_slab_free+0x14/0x30
kfree+0xc1/0x290
mlxsw_sp_acl_tcam_region_destroy+0x272/0x310
mlxsw_sp_acl_tcam_vregion_rehash_work+0x731/0x1300
process_one_work+0x8eb/0x19b0
worker_thread+0x6c9/0xf70
kthread+0x2c9/0x3b0
ret_from_fork+0x4d/0x80
ret_from_fork_asm+0x1a/0x30 |
In the Linux kernel, the following vulnerability has been resolved:
mlxsw: spectrum_acl_tcam: Fix memory leak during rehash
The rehash delayed work migrates filters from one region to another.
This is done by iterating over all chunks (all the filters with the same
priority) in the region and in each chunk iterating over all the
filters.
If the migration fails, the code tries to migrate the filters back to
the old region. However, the rollback itself can also fail in which case
another migration will be erroneously performed. Besides the fact that
this ping pong is not a very good idea, it also creates a problem.
Each virtual chunk references two chunks: The currently used one
('vchunk->chunk') and a backup ('vchunk->chunk2'). During migration the
first holds the chunk we want to migrate filters to and the second holds
the chunk we are migrating filters from.
The code currently assumes - but does not verify - that the backup chunk
does not exist (NULL) if the currently used chunk does not reference the
target region. This assumption breaks when we are trying to rollback a
rollback, resulting in the backup chunk being overwritten and leaked
[1].
Fix by not rolling back a failed rollback and add a warning to avoid
future cases.
[1]
WARNING: CPU: 5 PID: 1063 at lib/parman.c:291 parman_destroy+0x17/0x20
Modules linked in:
CPU: 5 PID: 1063 Comm: kworker/5:11 Tainted: G W 6.9.0-rc2-custom-00784-gc6a05c468a0b #14
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:parman_destroy+0x17/0x20
[...]
Call Trace:
<TASK>
mlxsw_sp_acl_atcam_region_fini+0x19/0x60
mlxsw_sp_acl_tcam_region_destroy+0x49/0xf0
mlxsw_sp_acl_tcam_vregion_rehash_work+0x1f1/0x470
process_one_work+0x151/0x370
worker_thread+0x2cb/0x3e0
kthread+0xd0/0x100
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1a/0x30
</TASK> |