CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: Fix out-of-bounds read of df_v1_7_channel_number
Check the fb_channel_number range to avoid the array out-of-bounds
read error |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: fix ucode out-of-bounds read warning
Clear warning that read ucode[] may out-of-bounds. |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: fix mc_data out-of-bounds read warning
Clear warning that read mc_data[i-1] may out-of-bounds. |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: fix dereference after null check
check the pointer hive before use. |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: Fix the null pointer dereference to ras_manager
Check ras_manager before using it |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_reloc
Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x03000001.
V2: To really improve the handling we would actually
need to have a separate value of 0xffffffff.(Christian) |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: fix double free err_addr pointer warnings
In amdgpu_umc_bad_page_polling_timeout, the amdgpu_umc_handle_bad_pages
will be run many times so that double free err_addr in some special case.
So set the err_addr to NULL to avoid the warnings. |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: avoid using null object of framebuffer
Instead of using state->fb->obj[0] directly, get object from framebuffer
by calling drm_gem_fb_get_obj() and return error code when object is
null to avoid using null object of framebuffer. |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: change vm->task_info handling
This patch changes the handling and lifecycle of vm->task_info object.
The major changes are:
- vm->task_info is a dynamically allocated ptr now, and its uasge is
reference counted.
- introducing two new helper funcs for task_info lifecycle management
- amdgpu_vm_get_task_info: reference counts up task_info before
returning this info
- amdgpu_vm_put_task_info: reference counts down task_info
- last put to task_info() frees task_info from the vm.
This patch also does logistical changes required for existing usage
of vm->task_info.
V2: Do not block all the prints when task_info not found (Felix)
V3: Fixed review comments from Felix
- Fix wrong indentation
- No debug message for -ENOMEM
- Add NULL check for task_info
- Do not duplicate the debug messages (ti vs no ti)
- Get first reference of task_info in vm_init(), put last
in vm_fini()
V4: Fixed review comments from Felix
- fix double reference increment in create_task_info
- change amdgpu_vm_get_task_info_pasid
- additional changes in amdgpu_gem.c while porting |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdkfd: range check cp bad op exception interrupts
Due to a CP interrupt bug, bad packet garbage exception codes are raised.
Do a range check so that the debugger and runtime do not receive garbage
codes.
Update the user api to guard exception code type checking as well. |
In the Linux kernel, the following vulnerability has been resolved:
amd/amdkfd: sync all devices to wait all processes being evicted
If there are more than one device doing reset in parallel, the first
device will call kfd_suspend_all_processes() to evict all processes
on all devices, this call takes time to finish. other device will
start reset and recover without waiting. if the process has not been
evicted before doing recover, it will be restored, then caused page
fault. |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: Skip do PCI error slot reset during RAS recovery
Why:
The PCI error slot reset maybe triggered after inject ue to UMC multi times, this
caused system hang.
[ 557.371857] amdgpu 0000:af:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 557.373718] [drm] PCIE GART of 512M enabled.
[ 557.373722] [drm] PTB located at 0x0000031FED700000
[ 557.373788] [drm] VRAM is lost due to GPU reset!
[ 557.373789] [drm] PSP is resuming...
[ 557.547012] mlx5_core 0000:55:00.0: mlx5_pci_err_detected Device state = 1 pci_status: 0. Exit, result = 3, need reset
[ 557.547067] [drm] PCI error: detected callback, state(1)!!
[ 557.547069] [drm] No support for XGMI hive yet...
[ 557.548125] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 0. Enter
[ 557.607763] mlx5_core 0000:55:00.0: wait vital counter value 0x16b5b after 1 iterations
[ 557.607777] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 1. Exit, err = 0, result = 5, recovered
[ 557.610492] [drm] PCI error: slot reset callback!!
...
[ 560.689382] amdgpu 0000:3f:00.0: amdgpu: GPU reset(2) succeeded!
[ 560.689546] amdgpu 0000:5a:00.0: amdgpu: GPU reset(2) succeeded!
[ 560.689562] general protection fault, probably for non-canonical address 0x5f080b54534f611f: 0000 [#1] SMP NOPTI
[ 560.701008] CPU: 16 PID: 2361 Comm: kworker/u448:9 Tainted: G OE 5.15.0-91-generic #101-Ubuntu
[ 560.712057] Hardware name: Microsoft C278A/C278A, BIOS C2789.5.BS.1C11.AG.1 11/08/2023
[ 560.720959] Workqueue: amdgpu-reset-hive amdgpu_ras_do_recovery [amdgpu]
[ 560.728887] RIP: 0010:amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu]
[ 560.736891] Code: ff 41 89 c6 e9 1b ff ff ff 44 0f b6 45 b0 e9 4f ff ff ff be 01 00 00 00 4c 89 e7 e8 76 c9 8b ff 44 0f b6 45 b0 e9 3c fd ff ff <48> 83 ba 18 02 00 00 00 0f 84 6a f8 ff ff 48 8d 7a 78 be 01 00 00
[ 560.757967] RSP: 0018:ffa0000032e53d80 EFLAGS: 00010202
[ 560.763848] RAX: ffa00000001dfd10 RBX: ffa0000000197090 RCX: ffa0000032e53db0
[ 560.771856] RDX: 5f080b54534f5f07 RSI: 0000000000000000 RDI: ff11000128100010
[ 560.779867] RBP: ffa0000032e53df0 R08: 0000000000000000 R09: ffffffffffe77f08
[ 560.787879] R10: 0000000000ffff0a R11: 0000000000000001 R12: 0000000000000000
[ 560.795889] R13: ffa0000032e53e00 R14: 0000000000000000 R15: 0000000000000000
[ 560.803889] FS: 0000000000000000(0000) GS:ff11007e7e800000(0000) knlGS:0000000000000000
[ 560.812973] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 560.819422] CR2: 000055a04c118e68 CR3: 0000000007410005 CR4: 0000000000771ee0
[ 560.827433] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 560.835433] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 560.843444] PKRU: 55555554
[ 560.846480] Call Trace:
[ 560.849225] <TASK>
[ 560.851580] ? show_trace_log_lvl+0x1d6/0x2ea
[ 560.856488] ? show_trace_log_lvl+0x1d6/0x2ea
[ 560.861379] ? amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu]
[ 560.867778] ? show_regs.part.0+0x23/0x29
[ 560.872293] ? __die_body.cold+0x8/0xd
[ 560.876502] ? die_addr+0x3e/0x60
[ 560.880238] ? exc_general_protection+0x1c5/0x410
[ 560.885532] ? asm_exc_general_protection+0x27/0x30
[ 560.891025] ? amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu]
[ 560.898323] amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu]
[ 560.904520] process_one_work+0x228/0x3d0
How:
In RAS recovery, mode-1 reset is issued from RAS fatal error handling and expected
all the nodes in a hive to be reset. no need to issue another mode-1 during this procedure. |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: Reset IH OVERFLOW_CLEAR bit
Allows us to detect subsequent IH ring buffer overflows as well. |
In the Linux kernel, the following vulnerability has been resolved:
amdkfd: use calloc instead of kzalloc to avoid integer overflow
This uses calloc instead of doing the multiplication which might
overflow. |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: Fix variable 'mca_funcs' dereferenced before NULL check in 'amdgpu_mca_smu_get_mca_entry()'
Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c:377 amdgpu_mca_smu_get_mca_entry() warn: variable dereferenced before check 'mca_funcs' (see line 368)
357 int amdgpu_mca_smu_get_mca_entry(struct amdgpu_device *adev,
enum amdgpu_mca_error_type type,
358 int idx, struct mca_bank_entry *entry)
359 {
360 const struct amdgpu_mca_smu_funcs *mca_funcs =
adev->mca.mca_funcs;
361 int count;
362
363 switch (type) {
364 case AMDGPU_MCA_ERROR_TYPE_UE:
365 count = mca_funcs->max_ue_count;
mca_funcs is dereferenced here.
366 break;
367 case AMDGPU_MCA_ERROR_TYPE_CE:
368 count = mca_funcs->max_ce_count;
mca_funcs is dereferenced here.
369 break;
370 default:
371 return -EINVAL;
372 }
373
374 if (idx >= count)
375 return -EINVAL;
376
377 if (mca_funcs && mca_funcs->mca_get_mca_entry)
^^^^^^^^^
Checked too late! |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: fix use-after-free bug
The bug can be triggered by sending a single amdgpu_gem_userptr_ioctl
to the AMDGPU DRM driver on any ASICs with an invalid address and size.
The bug was reported by Joonkyo Jung <joonkyoj@yonsei.ac.kr>.
For example the following code:
static void Syzkaller1(int fd)
{
struct drm_amdgpu_gem_userptr arg;
int ret;
arg.addr = 0xffffffffffff0000;
arg.size = 0x80000000; /*2 Gb*/
arg.flags = 0x7;
ret = drmIoctl(fd, 0xc1186451/*amdgpu_gem_userptr_ioctl*/, &arg);
}
Due to the address and size are not valid there is a failure in
amdgpu_hmm_register->mmu_interval_notifier_insert->__mmu_interval_notifier_insert->
check_shl_overflow, but we even the amdgpu_hmm_register failure we still call
amdgpu_hmm_unregister into amdgpu_gem_object_free which causes access to a bad address.
The following stack is below when the issue is reproduced when Kazan is enabled:
[ +0.000014] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020
[ +0.000009] RIP: 0010:mmu_interval_notifier_remove+0x327/0x340
[ +0.000017] Code: ff ff 49 89 44 24 08 48 b8 00 01 00 00 00 00 ad de 4c 89 f7 49 89 47 40 48 83 c0 22 49 89 47 48 e8 ce d1 2d 01 e9 32 ff ff ff <0f> 0b e9 16 ff ff ff 4c 89 ef e8 fa 14 b3 ff e9 36 ff ff ff e8 80
[ +0.000014] RSP: 0018:ffffc90002657988 EFLAGS: 00010246
[ +0.000013] RAX: 0000000000000000 RBX: 1ffff920004caf35 RCX: ffffffff8160565b
[ +0.000011] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8881a9f78260
[ +0.000010] RBP: ffffc90002657a70 R08: 0000000000000001 R09: fffff520004caf25
[ +0.000010] R10: 0000000000000003 R11: ffffffff8161d1d6 R12: ffff88810e988c00
[ +0.000010] R13: ffff888126fb5a00 R14: ffff88810e988c0c R15: ffff8881a9f78260
[ +0.000011] FS: 00007ff9ec848540(0000) GS:ffff8883cc880000(0000) knlGS:0000000000000000
[ +0.000012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000010] CR2: 000055b3f7e14328 CR3: 00000001b5770000 CR4: 0000000000350ef0
[ +0.000010] Call Trace:
[ +0.000006] <TASK>
[ +0.000007] ? show_regs+0x6a/0x80
[ +0.000018] ? __warn+0xa5/0x1b0
[ +0.000019] ? mmu_interval_notifier_remove+0x327/0x340
[ +0.000018] ? report_bug+0x24a/0x290
[ +0.000022] ? handle_bug+0x46/0x90
[ +0.000015] ? exc_invalid_op+0x19/0x50
[ +0.000016] ? asm_exc_invalid_op+0x1b/0x20
[ +0.000017] ? kasan_save_stack+0x26/0x50
[ +0.000017] ? mmu_interval_notifier_remove+0x23b/0x340
[ +0.000019] ? mmu_interval_notifier_remove+0x327/0x340
[ +0.000019] ? mmu_interval_notifier_remove+0x23b/0x340
[ +0.000020] ? __pfx_mmu_interval_notifier_remove+0x10/0x10
[ +0.000017] ? kasan_save_alloc_info+0x1e/0x30
[ +0.000018] ? srso_return_thunk+0x5/0x5f
[ +0.000014] ? __kasan_kmalloc+0xb1/0xc0
[ +0.000018] ? srso_return_thunk+0x5/0x5f
[ +0.000013] ? __kasan_check_read+0x11/0x20
[ +0.000020] amdgpu_hmm_unregister+0x34/0x50 [amdgpu]
[ +0.004695] amdgpu_gem_object_free+0x66/0xa0 [amdgpu]
[ +0.004534] ? __pfx_amdgpu_gem_object_free+0x10/0x10 [amdgpu]
[ +0.004291] ? do_syscall_64+0x5f/0xe0
[ +0.000023] ? srso_return_thunk+0x5/0x5f
[ +0.000017] drm_gem_object_free+0x3b/0x50 [drm]
[ +0.000489] amdgpu_gem_userptr_ioctl+0x306/0x500 [amdgpu]
[ +0.004295] ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu]
[ +0.004270] ? srso_return_thunk+0x5/0x5f
[ +0.000014] ? __this_cpu_preempt_check+0x13/0x20
[ +0.000015] ? srso_return_thunk+0x5/0x5f
[ +0.000013] ? sysvec_apic_timer_interrupt+0x57/0xc0
[ +0.000020] ? srso_return_thunk+0x5/0x5f
[ +0.000014] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ +0.000022] ? drm_ioctl_kernel+0x17b/0x1f0 [drm]
[ +0.000496] ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu]
[ +0.004272] ? drm_ioctl_kernel+0x190/0x1f0 [drm]
[ +0.000492] drm_ioctl_kernel+0x140/0x1f0 [drm]
[ +0.000497] ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu]
[ +0.004297] ? __pfx_drm_ioctl_kernel+0x10/0x10 [d
---truncated--- |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdkfd: Fix an illegal memory access
In the kfd_wait_on_events() function, the kfd_event_waiter structure is
allocated by alloc_event_waiters(), but the event field of the waiter
structure is not initialized; When copy_from_user() fails in the
kfd_wait_on_events() function, it will enter exception handling to
release the previously allocated memory of the waiter structure;
Due to the event field of the waiters structure being accessed
in the free_waiters() function, this results in illegal memory access
and system crash, here is the crash log:
localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0
localhost kernel: RSP: 0018:ffffaa53c362bd60 EFLAGS: 00010082
localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0000000000000282 RCX: 00000000002c0000
localhost kernel: RDX: ffff9e855eeacb80 RSI: 000000000000279c RDI: ffffe7088f6a21d0
localhost kernel: RBP: ffffe7088f6a21d0 R08: 00000000002c0000 R09: ffffaa53c362be64
localhost kernel: R10: ffffaa53c362bbd8 R11: 0000000000000001 R12: 0000000000000002
localhost kernel: R13: ffff9e7ead15d600 R14: 0000000000000000 R15: ffff9e7ead15d698
localhost kernel: FS: 0000152a3d111700(0000) GS:ffff9e855ee80000(0000) knlGS:0000000000000000
localhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
localhost kernel: CR2: 0000152938000010 CR3: 000000044d7a4000 CR4: 00000000003506e0
localhost kernel: Call Trace:
localhost kernel: _raw_spin_lock_irqsave+0x30/0x40
localhost kernel: remove_wait_queue+0x12/0x50
localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu]
localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu]
localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu]
localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
localhost kernel: __x64_sys_ioctl+0x8e/0xd0
localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0
localhost kernel: do_syscall_64+0x33/0x80
localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
localhost kernel: RIP: 0033:0x152a4dff68d7
Allocate the structure with kcalloc, and remove redundant 0-initialization
and a redundant loop condition check. |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: fix ttm_bo calltrace warning in psp_hw_fini
The call trace occurs when the amdgpu is removed after
the mode1 reset. During mode1 reset, from suspend to resume,
there is no need to reinitialize the ta firmware buffer
which caused the bo pin_count increase redundantly.
[ 489.885525] Call Trace:
[ 489.885525] <TASK>
[ 489.885526] amdttm_bo_put+0x34/0x50 [amdttm]
[ 489.885529] amdgpu_bo_free_kernel+0xe8/0x130 [amdgpu]
[ 489.885620] psp_free_shared_bufs+0xb7/0x150 [amdgpu]
[ 489.885720] psp_hw_fini+0xce/0x170 [amdgpu]
[ 489.885815] amdgpu_device_fini_hw+0x2ff/0x413 [amdgpu]
[ 489.885960] ? blocking_notifier_chain_unregister+0x56/0xb0
[ 489.885962] amdgpu_driver_unload_kms+0x51/0x60 [amdgpu]
[ 489.886049] amdgpu_pci_remove+0x5a/0x140 [amdgpu]
[ 489.886132] ? __pm_runtime_resume+0x60/0x90
[ 489.886134] pci_device_remove+0x3e/0xb0
[ 489.886135] __device_release_driver+0x1ab/0x2a0
[ 489.886137] driver_detach+0xf3/0x140
[ 489.886138] bus_remove_driver+0x6c/0xf0
[ 489.886140] driver_unregister+0x31/0x60
[ 489.886141] pci_unregister_driver+0x40/0x90
[ 489.886142] amdgpu_exit+0x15/0x451 [amdgpu] |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu/vkms: fix a possible null pointer dereference
In amdgpu_vkms_conn_get_modes(), the return value of drm_cvt_mode()
is assigned to mode, which will lead to a NULL pointer dereference
on failure of drm_cvt_mode(). Add a check to avoid null pointer
dereference. |
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: Fix possible NULL dereference in amdgpu_ras_query_error_status_helper()
Return invalid error code -EINVAL for invalid block id.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1183 amdgpu_ras_query_error_status_helper() error: we previously assumed 'info' could be null (see line 1176) |