CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: fix dereference after null check
check the pointer hive before use. |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: Fix the null pointer dereference to ras_manager
Check ras_manager before using it |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_reloc
Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x03000001.
V2: To really improve the handling we would actually
need to have a separate value of 0xffffffff.(Christian) |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: fix double free err_addr pointer warnings
In amdgpu_umc_bad_page_polling_timeout, the amdgpu_umc_handle_bad_pages
will be run many times so that double free err_addr in some special case.
So set the err_addr to NULL to avoid the warnings. |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: avoid using null object of framebuffer
Instead of using state->fb->obj[0] directly, get object from framebuffer
by calling drm_gem_fb_get_obj() and return error code when object is
null to avoid using null object of framebuffer. |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: change vm->task_info handling
This patch changes the handling and lifecycle of vm->task_info object.
The major changes are:
- vm->task_info is a dynamically allocated ptr now, and its uasge is
reference counted.
- introducing two new helper funcs for task_info lifecycle management
- amdgpu_vm_get_task_info: reference counts up task_info before
returning this info
- amdgpu_vm_put_task_info: reference counts down task_info
- last put to task_info() frees task_info from the vm.
This patch also does logistical changes required for existing usage
of vm->task_info.
V2: Do not block all the prints when task_info not found (Felix)
V3: Fixed review comments from Felix
- Fix wrong indentation
- No debug message for -ENOMEM
- Add NULL check for task_info
- Do not duplicate the debug messages (ti vs no ti)
- Get first reference of task_info in vm_init(), put last
in vm_fini()
V4: Fixed review comments from Felix
- fix double reference increment in create_task_info
- change amdgpu_vm_get_task_info_pasid
- additional changes in amdgpu_gem.c while porting |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdkfd: range check cp bad op exception interrupts
Due to a CP interrupt bug, bad packet garbage exception codes are raised.
Do a range check so that the debugger and runtime do not receive garbage
codes.
Update the user api to guard exception code type checking as well. |
In the Linux kernel, the following vulnerability has been resolved:
amd/amdkfd: sync all devices to wait all processes being evicted
If there are more than one device doing reset in parallel, the first
device will call kfd_suspend_all_processes() to evict all processes
on all devices, this call takes time to finish. other device will
start reset and recover without waiting. if the process has not been
evicted before doing recover, it will be restored, then caused page
fault. |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: Skip do PCI error slot reset during RAS recovery
Why:
The PCI error slot reset maybe triggered after inject ue to UMC multi times, this
caused system hang.
[ 557.371857] amdgpu 0000:af:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 557.373718] [drm] PCIE GART of 512M enabled.
[ 557.373722] [drm] PTB located at 0x0000031FED700000
[ 557.373788] [drm] VRAM is lost due to GPU reset!
[ 557.373789] [drm] PSP is resuming...
[ 557.547012] mlx5_core 0000:55:00.0: mlx5_pci_err_detected Device state = 1 pci_status: 0. Exit, result = 3, need reset
[ 557.547067] [drm] PCI error: detected callback, state(1)!!
[ 557.547069] [drm] No support for XGMI hive yet...
[ 557.548125] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 0. Enter
[ 557.607763] mlx5_core 0000:55:00.0: wait vital counter value 0x16b5b after 1 iterations
[ 557.607777] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 1. Exit, err = 0, result = 5, recovered
[ 557.610492] [drm] PCI error: slot reset callback!!
...
[ 560.689382] amdgpu 0000:3f:00.0: amdgpu: GPU reset(2) succeeded!
[ 560.689546] amdgpu 0000:5a:00.0: amdgpu: GPU reset(2) succeeded!
[ 560.689562] general protection fault, probably for non-canonical address 0x5f080b54534f611f: 0000 [#1] SMP NOPTI
[ 560.701008] CPU: 16 PID: 2361 Comm: kworker/u448:9 Tainted: G OE 5.15.0-91-generic #101-Ubuntu
[ 560.712057] Hardware name: Microsoft C278A/C278A, BIOS C2789.5.BS.1C11.AG.1 11/08/2023
[ 560.720959] Workqueue: amdgpu-reset-hive amdgpu_ras_do_recovery [amdgpu]
[ 560.728887] RIP: 0010:amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu]
[ 560.736891] Code: ff 41 89 c6 e9 1b ff ff ff 44 0f b6 45 b0 e9 4f ff ff ff be 01 00 00 00 4c 89 e7 e8 76 c9 8b ff 44 0f b6 45 b0 e9 3c fd ff ff <48> 83 ba 18 02 00 00 00 0f 84 6a f8 ff ff 48 8d 7a 78 be 01 00 00
[ 560.757967] RSP: 0018:ffa0000032e53d80 EFLAGS: 00010202
[ 560.763848] RAX: ffa00000001dfd10 RBX: ffa0000000197090 RCX: ffa0000032e53db0
[ 560.771856] RDX: 5f080b54534f5f07 RSI: 0000000000000000 RDI: ff11000128100010
[ 560.779867] RBP: ffa0000032e53df0 R08: 0000000000000000 R09: ffffffffffe77f08
[ 560.787879] R10: 0000000000ffff0a R11: 0000000000000001 R12: 0000000000000000
[ 560.795889] R13: ffa0000032e53e00 R14: 0000000000000000 R15: 0000000000000000
[ 560.803889] FS: 0000000000000000(0000) GS:ff11007e7e800000(0000) knlGS:0000000000000000
[ 560.812973] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 560.819422] CR2: 000055a04c118e68 CR3: 0000000007410005 CR4: 0000000000771ee0
[ 560.827433] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 560.835433] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 560.843444] PKRU: 55555554
[ 560.846480] Call Trace:
[ 560.849225] <TASK>
[ 560.851580] ? show_trace_log_lvl+0x1d6/0x2ea
[ 560.856488] ? show_trace_log_lvl+0x1d6/0x2ea
[ 560.861379] ? amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu]
[ 560.867778] ? show_regs.part.0+0x23/0x29
[ 560.872293] ? __die_body.cold+0x8/0xd
[ 560.876502] ? die_addr+0x3e/0x60
[ 560.880238] ? exc_general_protection+0x1c5/0x410
[ 560.885532] ? asm_exc_general_protection+0x27/0x30
[ 560.891025] ? amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu]
[ 560.898323] amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu]
[ 560.904520] process_one_work+0x228/0x3d0
How:
In RAS recovery, mode-1 reset is issued from RAS fatal error handling and expected
all the nodes in a hive to be reset. no need to issue another mode-1 during this procedure. |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: Reset IH OVERFLOW_CLEAR bit
Allows us to detect subsequent IH ring buffer overflows as well. |
In the Linux kernel, the following vulnerability has been resolved:
amdkfd: use calloc instead of kzalloc to avoid integer overflow
This uses calloc instead of doing the multiplication which might
overflow. |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: Fix variable 'mca_funcs' dereferenced before NULL check in 'amdgpu_mca_smu_get_mca_entry()'
Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c:377 amdgpu_mca_smu_get_mca_entry() warn: variable dereferenced before check 'mca_funcs' (see line 368)
357 int amdgpu_mca_smu_get_mca_entry(struct amdgpu_device *adev,
enum amdgpu_mca_error_type type,
358 int idx, struct mca_bank_entry *entry)
359 {
360 const struct amdgpu_mca_smu_funcs *mca_funcs =
adev->mca.mca_funcs;
361 int count;
362
363 switch (type) {
364 case AMDGPU_MCA_ERROR_TYPE_UE:
365 count = mca_funcs->max_ue_count;
mca_funcs is dereferenced here.
366 break;
367 case AMDGPU_MCA_ERROR_TYPE_CE:
368 count = mca_funcs->max_ce_count;
mca_funcs is dereferenced here.
369 break;
370 default:
371 return -EINVAL;
372 }
373
374 if (idx >= count)
375 return -EINVAL;
376
377 if (mca_funcs && mca_funcs->mca_get_mca_entry)
^^^^^^^^^
Checked too late! |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: fix use-after-free bug
The bug can be triggered by sending a single amdgpu_gem_userptr_ioctl
to the AMDGPU DRM driver on any ASICs with an invalid address and size.
The bug was reported by Joonkyo Jung <joonkyoj@yonsei.ac.kr>.
For example the following code:
static void Syzkaller1(int fd)
{
struct drm_amdgpu_gem_userptr arg;
int ret;
arg.addr = 0xffffffffffff0000;
arg.size = 0x80000000; /*2 Gb*/
arg.flags = 0x7;
ret = drmIoctl(fd, 0xc1186451/*amdgpu_gem_userptr_ioctl*/, &arg);
}
Due to the address and size are not valid there is a failure in
amdgpu_hmm_register->mmu_interval_notifier_insert->__mmu_interval_notifier_insert->
check_shl_overflow, but we even the amdgpu_hmm_register failure we still call
amdgpu_hmm_unregister into amdgpu_gem_object_free which causes access to a bad address.
The following stack is below when the issue is reproduced when Kazan is enabled:
[ +0.000014] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020
[ +0.000009] RIP: 0010:mmu_interval_notifier_remove+0x327/0x340
[ +0.000017] Code: ff ff 49 89 44 24 08 48 b8 00 01 00 00 00 00 ad de 4c 89 f7 49 89 47 40 48 83 c0 22 49 89 47 48 e8 ce d1 2d 01 e9 32 ff ff ff <0f> 0b e9 16 ff ff ff 4c 89 ef e8 fa 14 b3 ff e9 36 ff ff ff e8 80
[ +0.000014] RSP: 0018:ffffc90002657988 EFLAGS: 00010246
[ +0.000013] RAX: 0000000000000000 RBX: 1ffff920004caf35 RCX: ffffffff8160565b
[ +0.000011] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8881a9f78260
[ +0.000010] RBP: ffffc90002657a70 R08: 0000000000000001 R09: fffff520004caf25
[ +0.000010] R10: 0000000000000003 R11: ffffffff8161d1d6 R12: ffff88810e988c00
[ +0.000010] R13: ffff888126fb5a00 R14: ffff88810e988c0c R15: ffff8881a9f78260
[ +0.000011] FS: 00007ff9ec848540(0000) GS:ffff8883cc880000(0000) knlGS:0000000000000000
[ +0.000012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000010] CR2: 000055b3f7e14328 CR3: 00000001b5770000 CR4: 0000000000350ef0
[ +0.000010] Call Trace:
[ +0.000006] <TASK>
[ +0.000007] ? show_regs+0x6a/0x80
[ +0.000018] ? __warn+0xa5/0x1b0
[ +0.000019] ? mmu_interval_notifier_remove+0x327/0x340
[ +0.000018] ? report_bug+0x24a/0x290
[ +0.000022] ? handle_bug+0x46/0x90
[ +0.000015] ? exc_invalid_op+0x19/0x50
[ +0.000016] ? asm_exc_invalid_op+0x1b/0x20
[ +0.000017] ? kasan_save_stack+0x26/0x50
[ +0.000017] ? mmu_interval_notifier_remove+0x23b/0x340
[ +0.000019] ? mmu_interval_notifier_remove+0x327/0x340
[ +0.000019] ? mmu_interval_notifier_remove+0x23b/0x340
[ +0.000020] ? __pfx_mmu_interval_notifier_remove+0x10/0x10
[ +0.000017] ? kasan_save_alloc_info+0x1e/0x30
[ +0.000018] ? srso_return_thunk+0x5/0x5f
[ +0.000014] ? __kasan_kmalloc+0xb1/0xc0
[ +0.000018] ? srso_return_thunk+0x5/0x5f
[ +0.000013] ? __kasan_check_read+0x11/0x20
[ +0.000020] amdgpu_hmm_unregister+0x34/0x50 [amdgpu]
[ +0.004695] amdgpu_gem_object_free+0x66/0xa0 [amdgpu]
[ +0.004534] ? __pfx_amdgpu_gem_object_free+0x10/0x10 [amdgpu]
[ +0.004291] ? do_syscall_64+0x5f/0xe0
[ +0.000023] ? srso_return_thunk+0x5/0x5f
[ +0.000017] drm_gem_object_free+0x3b/0x50 [drm]
[ +0.000489] amdgpu_gem_userptr_ioctl+0x306/0x500 [amdgpu]
[ +0.004295] ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu]
[ +0.004270] ? srso_return_thunk+0x5/0x5f
[ +0.000014] ? __this_cpu_preempt_check+0x13/0x20
[ +0.000015] ? srso_return_thunk+0x5/0x5f
[ +0.000013] ? sysvec_apic_timer_interrupt+0x57/0xc0
[ +0.000020] ? srso_return_thunk+0x5/0x5f
[ +0.000014] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ +0.000022] ? drm_ioctl_kernel+0x17b/0x1f0 [drm]
[ +0.000496] ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu]
[ +0.004272] ? drm_ioctl_kernel+0x190/0x1f0 [drm]
[ +0.000492] drm_ioctl_kernel+0x140/0x1f0 [drm]
[ +0.000497] ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu]
[ +0.004297] ? __pfx_drm_ioctl_kernel+0x10/0x10 [d
---truncated--- |
In the Linux kernel, the following vulnerability has been resolved:
drm/amd/display: Fix possible underflow for displays with large vblank
[Why]
Underflow observed when using a display with a large vblank region
and low refresh rate
[How]
Simplify calculation of vblank_nom
Increase value for VBlankNomDefaultUS to 800us |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: install stub fence into potential unused fence pointers
When using cpu to update page tables, vm update fences are unused.
Install stub fence into these fence pointers instead of NULL
to avoid NULL dereference when calling dma_fence_wait() on them. |
In the Linux kernel, the following vulnerability has been resolved:
erofs: Fix detection of atomic context
Current check for atomic context is not sufficient as
z_erofs_decompressqueue_endio can be called under rcu lock
from blk_mq_flush_plug_list(). See the stacktrace [1]
In such case we should hand off the decompression work for async
processing rather than trying to do sync decompression in current
context. Patch fixes the detection by checking for
rcu_read_lock_any_held() and while at it use more appropriate
!in_task() check than in_atomic().
Background: Historically erofs would always schedule a kworker for
decompression which would incur the scheduling cost regardless of
the context. But z_erofs_decompressqueue_endio() may not always
be in atomic context and we could actually benefit from doing the
decompression in z_erofs_decompressqueue_endio() if we are in
thread context, for example when running with dm-verity.
This optimization was later added in patch [2] which has shown
improvement in performance benchmarks.
==============================================
[1] Problem stacktrace
[name:core&]BUG: sleeping function called from invalid context at kernel/locking/mutex.c:291
[name:core&]in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1615, name: CpuMonitorServi
[name:core&]preempt_count: 0, expected: 0
[name:core&]RCU nest depth: 1, expected: 0
CPU: 7 PID: 1615 Comm: CpuMonitorServi Tainted: G S W OE 6.1.25-android14-5-maybe-dirty-mainline #1
Hardware name: MT6897 (DT)
Call trace:
dump_backtrace+0x108/0x15c
show_stack+0x20/0x30
dump_stack_lvl+0x6c/0x8c
dump_stack+0x20/0x48
__might_resched+0x1fc/0x308
__might_sleep+0x50/0x88
mutex_lock+0x2c/0x110
z_erofs_decompress_queue+0x11c/0xc10
z_erofs_decompress_kickoff+0x110/0x1a4
z_erofs_decompressqueue_endio+0x154/0x180
bio_endio+0x1b0/0x1d8
__dm_io_complete+0x22c/0x280
clone_endio+0xe4/0x280
bio_endio+0x1b0/0x1d8
blk_update_request+0x138/0x3a4
blk_mq_plug_issue_direct+0xd4/0x19c
blk_mq_flush_plug_list+0x2b0/0x354
__blk_flush_plug+0x110/0x160
blk_finish_plug+0x30/0x4c
read_pages+0x2fc/0x370
page_cache_ra_unbounded+0xa4/0x23c
page_cache_ra_order+0x290/0x320
do_sync_mmap_readahead+0x108/0x2c0
filemap_fault+0x19c/0x52c
__do_fault+0xc4/0x114
handle_mm_fault+0x5b4/0x1168
do_page_fault+0x338/0x4b4
do_translation_fault+0x40/0x60
do_mem_abort+0x60/0xc8
el0_da+0x4c/0xe0
el0t_64_sync_handler+0xd4/0xfc
el0t_64_sync+0x1a0/0x1a4
[2] Link: https://lore.kernel.org/all/20210317035448.13921-1-huangjianan@oppo.com/ |
In the Linux kernel, the following vulnerability has been resolved:
fs/ntfs3: Add length check in indx_get_root
This adds a length check to guarantee the retrieved index root is legit.
[ 162.459513] BUG: KASAN: use-after-free in hdr_find_e.isra.0+0x10c/0x320
[ 162.460176] Read of size 2 at addr ffff8880037bca99 by task mount/243
[ 162.460851]
[ 162.461252] CPU: 0 PID: 243 Comm: mount Not tainted 6.0.0-rc7 #42
[ 162.461744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 162.462609] Call Trace:
[ 162.462954] <TASK>
[ 162.463276] dump_stack_lvl+0x49/0x63
[ 162.463822] print_report.cold+0xf5/0x689
[ 162.464608] ? unwind_get_return_address+0x3a/0x60
[ 162.465766] ? hdr_find_e.isra.0+0x10c/0x320
[ 162.466975] kasan_report+0xa7/0x130
[ 162.467506] ? _raw_spin_lock_irq+0xc0/0xf0
[ 162.467998] ? hdr_find_e.isra.0+0x10c/0x320
[ 162.468536] __asan_load2+0x68/0x90
[ 162.468923] hdr_find_e.isra.0+0x10c/0x320
[ 162.469282] ? cmp_uints+0xe0/0xe0
[ 162.469557] ? cmp_sdh+0x90/0x90
[ 162.469864] ? ni_find_attr+0x214/0x300
[ 162.470217] ? ni_load_mi+0x80/0x80
[ 162.470479] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 162.470931] ? ntfs_bread_run+0x190/0x190
[ 162.471307] ? indx_get_root+0xe4/0x190
[ 162.471556] ? indx_get_root+0x140/0x190
[ 162.471833] ? indx_init+0x1e0/0x1e0
[ 162.472069] ? fnd_clear+0x115/0x140
[ 162.472363] ? _raw_spin_lock_irqsave+0x100/0x100
[ 162.472731] indx_find+0x184/0x470
[ 162.473461] ? sysvec_apic_timer_interrupt+0x57/0xc0
[ 162.474429] ? indx_find_buffer+0x2d0/0x2d0
[ 162.474704] ? do_syscall_64+0x3b/0x90
[ 162.474962] dir_search_u+0x196/0x2f0
[ 162.475381] ? ntfs_nls_to_utf16+0x450/0x450
[ 162.475661] ? ntfs_security_init+0x3d6/0x440
[ 162.475906] ? is_sd_valid+0x180/0x180
[ 162.476191] ntfs_extend_init+0x13f/0x2c0
[ 162.476496] ? ntfs_fix_post_read+0x130/0x130
[ 162.476861] ? iput.part.0+0x286/0x320
[ 162.477325] ntfs_fill_super+0x11e0/0x1b50
[ 162.477709] ? put_ntfs+0x1d0/0x1d0
[ 162.477970] ? vsprintf+0x20/0x20
[ 162.478258] ? set_blocksize+0x95/0x150
[ 162.478538] get_tree_bdev+0x232/0x370
[ 162.478789] ? put_ntfs+0x1d0/0x1d0
[ 162.479038] ntfs_fs_get_tree+0x15/0x20
[ 162.479374] vfs_get_tree+0x4c/0x130
[ 162.479729] path_mount+0x654/0xfe0
[ 162.480124] ? putname+0x80/0xa0
[ 162.480484] ? finish_automount+0x2e0/0x2e0
[ 162.480894] ? putname+0x80/0xa0
[ 162.481467] ? kmem_cache_free+0x1c4/0x440
[ 162.482280] ? putname+0x80/0xa0
[ 162.482714] do_mount+0xd6/0xf0
[ 162.483264] ? path_mount+0xfe0/0xfe0
[ 162.484782] ? __kasan_check_write+0x14/0x20
[ 162.485593] __x64_sys_mount+0xca/0x110
[ 162.486024] do_syscall_64+0x3b/0x90
[ 162.486543] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 162.487141] RIP: 0033:0x7f9d374e948a
[ 162.488324] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 162.489728] RSP: 002b:00007ffe30e73d18 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 162.490971] RAX: ffffffffffffffda RBX: 0000561cdb43a060 RCX: 00007f9d374e948a
[ 162.491669] RDX: 0000561cdb43a260 RSI: 0000561cdb43a2e0 RDI: 0000561cdb442af0
[ 162.492050] RBP: 0000000000000000 R08: 0000561cdb43a280 R09: 0000000000000020
[ 162.492459] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000561cdb442af0
[ 162.493183] R13: 0000561cdb43a260 R14: 0000000000000000 R15: 00000000ffffffff
[ 162.493644] </TASK>
[ 162.493908]
[ 162.494214] The buggy address belongs to the physical page:
[ 162.494761] page:000000003e38a3d5 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x37bc
[ 162.496064] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
[ 162.497278] raw: 000fffffc0000000 ffffea00000df1c8 ffffea00000df008 0000000000000000
[ 162.498928] raw: 0000000000000000 0000000000240000 00000000ffffffff 0000000000000000
[ 162.500542] page dumped becau
---truncated--- |
In the Linux kernel, the following vulnerability has been resolved:
wifi: ath12k: Avoid NULL pointer access during management transmit cleanup
Currently 'ar' reference is not added in skb_cb.
Though this is generally not used during transmit completion
callbacks, on interface removal the remaining idr cleanup callback
uses the ar pointer from skb_cb from management txmgmt_idr. Hence fill them
during transmit call for proper usage to avoid NULL pointer dereference.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1 |
In the Linux kernel, the following vulnerability has been resolved:
mm: fix zswap writeback race condition
The zswap writeback mechanism can cause a race condition resulting in
memory corruption, where a swapped out page gets swapped in with data that
was written to a different page.
The race unfolds like this:
1. a page with data A and swap offset X is stored in zswap
2. page A is removed off the LRU by zpool driver for writeback in
zswap-shrink work, data for A is mapped by zpool driver
3. user space program faults and invalidates page entry A, offset X is
considered free
4. kswapd stores page B at offset X in zswap (zswap could also be
full, if so, page B would then be IOed to X, then skip step 5.)
5. entry A is replaced by B in tree->rbroot, this doesn't affect the
local reference held by zswap-shrink work
6. zswap-shrink work writes back A at X, and frees zswap entry A
7. swapin of slot X brings A in memory instead of B
The fix:
Once the swap page cache has been allocated (case ZSWAP_SWAPCACHE_NEW),
zswap-shrink work just checks that the local zswap_entry reference is
still the same as the one in the tree. If it's not the same it means that
it's either been invalidated or replaced, in both cases the writeback is
aborted because the local entry contains stale data.
Reproducer:
I originally found this by running `stress` overnight to validate my work
on the zswap writeback mechanism, it manifested after hours on my test
machine. The key to make it happen is having zswap writebacks, so
whatever setup pumps /sys/kernel/debug/zswap/written_back_pages should do
the trick.
In order to reproduce this faster on a vm, I setup a system with ~100M of
available memory and a 500M swap file, then running `stress --vm 1
--vm-bytes 300000000 --vm-stride 4000` makes it happen in matter of tens
of minutes. One can speed things up even more by swinging
/sys/module/zswap/parameters/max_pool_percent up and down between, say, 20
and 1; this makes it reproduce in tens of seconds. It's crucial to set
`--vm-stride` to something other than 4096 otherwise `stress` won't
realize that memory has been corrupted because all pages would have the
same data. |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: fix calltrace warning in amddrm_buddy_fini
The following call trace is observed when removing the amdgpu driver, which
is caused by that BOs allocated for psp are not freed until removing.
[61811.450562] RIP: 0010:amddrm_buddy_fini.cold+0x29/0x47 [amddrm_buddy]
[61811.450577] Call Trace:
[61811.450577] <TASK>
[61811.450579] amdgpu_vram_mgr_fini+0x135/0x1c0 [amdgpu]
[61811.450728] amdgpu_ttm_fini+0x207/0x290 [amdgpu]
[61811.450870] amdgpu_bo_fini+0x27/0xa0 [amdgpu]
[61811.451012] gmc_v9_0_sw_fini+0x4a/0x60 [amdgpu]
[61811.451166] amdgpu_device_fini_sw+0x117/0x520 [amdgpu]
[61811.451306] amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
[61811.451447] devm_drm_dev_init_release+0x4d/0x80 [drm]
[61811.451466] devm_action_release+0x15/0x20
[61811.451469] release_nodes+0x40/0xb0
[61811.451471] devres_release_all+0x9b/0xd0
[61811.451473] __device_release_driver+0x1bb/0x2a0
[61811.451476] driver_detach+0xf3/0x140
[61811.451479] bus_remove_driver+0x6c/0xf0
[61811.451481] driver_unregister+0x31/0x60
[61811.451483] pci_unregister_driver+0x40/0x90
[61811.451486] amdgpu_exit+0x15/0x447 [amdgpu]
For smu v13_0_2, if the GPU supports xgmi, refer to
commit f5c7e7797060 ("drm/amdgpu: Adjust removal control flow for smu v13_0_2"),
it will run gpu recover in AMDGPU_RESET_FOR_DEVICE_REMOVE mode when removing,
which makes all devices in hive list have hw reset but no resume except the
basic ip blocks, then other ip blocks will not call .hw_fini according to
ip_block.status.hw.
Since psp_free_shared_bufs just includes some software operations, so move
it to psp_sw_fini. |