Description
In the Linux kernel, the following vulnerability has been resolved:

RDMA/mlx5: Fix UMR hang in LAG error state unload

During firmware reset in LAG mode, a race condition causes the driver
to hang indefinitely while waiting for UMR completion during device
unload. See [1].

In LAG mode the bond device is only registered on the master, so it
never sees sys_error events from the slave.
During firmware reset this causes UMR waits to hang forever on unload
as the slave is dead but the master hasn't entered error state yet, so
UMR posts succeed but completions never arrive.

Fix this by adding a sys_error notifier that gets registered before
MLX5_IB_STAGE_IB_REG and stays alive until after ib_unregister_device().
This ensures error events reach the bond device throughout teardown.

[1]
Call Trace:
__schedule+0x2bd/0x760
schedule+0x37/0xa0
schedule_preempt_disabled+0xa/0x10
__mutex_lock.isra.6+0x2b5/0x4a0
__mlx5_ib_dereg_mr+0x606/0x870 [mlx5_ib]
? __xa_erase+0x4a/0xa0
? _cond_resched+0x15/0x30
? wait_for_completion+0x31/0x100
ib_dereg_mr_user+0x48/0xc0 [ib_core]
? rdmacg_uncharge_hierarchy+0xa0/0x100
destroy_hw_idr_uobject+0x20/0x50 [ib_uverbs]
uverbs_destroy_uobject+0x37/0x150 [ib_uverbs]
__uverbs_cleanup_ufile+0xda/0x140 [ib_uverbs]
uverbs_destroy_ufile_hw+0x3a/0xf0 [ib_uverbs]
ib_uverbs_remove_one+0xc3/0x140 [ib_uverbs]
remove_client_context+0x8b/0xd0 [ib_core]
disable_device+0x8c/0x130 [ib_core]
__ib_unregister_device+0x10d/0x180 [ib_core]
ib_unregister_device+0x21/0x30 [ib_core]
__mlx5_ib_remove+0x1e4/0x1f0 [mlx5_ib]
auxiliary_bus_remove+0x1e/0x30
device_release_driver_internal+0x103/0x1f0
bus_remove_device+0xf7/0x170
device_del+0x181/0x410
mlx5_rescan_drivers_locked.part.10+0xa9/0x1d0 [mlx5_core]
mlx5_disable_lag+0x253/0x260 [mlx5_core]
mlx5_lag_disable_change+0x89/0xc0 [mlx5_core]
mlx5_eswitch_disable+0x67/0xa0 [mlx5_core]
mlx5_unload+0x15/0xd0 [mlx5_core]
mlx5_unload_one+0x71/0xc0 [mlx5_core]
mlx5_sync_reset_reload_work+0x83/0x100 [mlx5_core]
process_one_work+0x1a7/0x360
worker_thread+0x30/0x390
? create_worker+0x1a0/0x1a0
kthread+0x116/0x130
? kthread_flush_work_fn+0x10/0x10
ret_from_fork+0x22/0x40
Published: 2026-05-27
Score: n/a
EPSS: n/a
KEV: No
Impact: n/a
Action: n/a
AI Analysis

Impact

A race condition in the Linux mlx5 RDMA driver causes the kernel to hang indefinitely when a device in Link Aggregation Group (LAG) mode is being unloaded after a firmware reset. The hang occurs while the driver waits for User-Mode Registration (UMR) completion; because the slave bond device is not registered on the master, the completion never arrives, locking up the kernel. This leads to a denial‑of‑service condition on the affected host.

Affected Systems

The flaw resides in the Linux kernel’s mlx5 driver, affecting all distributions that ship this driver when LAG bonding is enabled, regardless of vendor. No specific kernel release is listed, but any kernel containing the mlx5 implementation before the fix is susceptible. The patch is integrated into the kernel’s stable tree and referenced by the provided kernel commit links.

Risk and Exploitability

The vulnerability carries high availability risk because it can freeze the entire system. Attackers would need preliminary control of the kernel or the ability to trigger a firmware reset on a device in LAG mode; the data does not confirm a remote attack vector, so the primary risk is local. EPSS information is unavailable and the flaw is not in the CISA KEV catalog, but the nature of the kernel hang justifies a high severity posture. The fix adds a sys_error notifier to ensure error events propagate during teardown, preventing the hang.

Generated by OpenCVE AI on May 27, 2026 at 17:15 UTC.

Remediation

No vendor fix or workaround currently provided.

OpenCVE Recommended Actions

  • Update the Linux kernel to a version that includes the commit adding the sys_error notifier to the mlx5 driver.
  • Verify that the kernel no longer hangs during device unload in LAG mode after the update.
  • If a kernel update cannot be applied immediately, disable LAG mode or avoid performing firmware resets on affected RDMA devices until the patch is installed.

Generated by OpenCVE AI on May 27, 2026 at 17:15 UTC.

Tracking

Sign in to view the affected projects.

Advisories

No advisories yet.

History

Wed, 27 May 2026 17:30:00 +0000

Type Values Removed Values Added
Weaknesses CWE-362
CWE-754

Wed, 27 May 2026 14:15:00 +0000

Type Values Removed Values Added
Description In the Linux kernel, the following vulnerability has been resolved: RDMA/mlx5: Fix UMR hang in LAG error state unload During firmware reset in LAG mode, a race condition causes the driver to hang indefinitely while waiting for UMR completion during device unload. See [1]. In LAG mode the bond device is only registered on the master, so it never sees sys_error events from the slave. During firmware reset this causes UMR waits to hang forever on unload as the slave is dead but the master hasn't entered error state yet, so UMR posts succeed but completions never arrive. Fix this by adding a sys_error notifier that gets registered before MLX5_IB_STAGE_IB_REG and stays alive until after ib_unregister_device(). This ensures error events reach the bond device throughout teardown. [1] Call Trace: __schedule+0x2bd/0x760 schedule+0x37/0xa0 schedule_preempt_disabled+0xa/0x10 __mutex_lock.isra.6+0x2b5/0x4a0 __mlx5_ib_dereg_mr+0x606/0x870 [mlx5_ib] ? __xa_erase+0x4a/0xa0 ? _cond_resched+0x15/0x30 ? wait_for_completion+0x31/0x100 ib_dereg_mr_user+0x48/0xc0 [ib_core] ? rdmacg_uncharge_hierarchy+0xa0/0x100 destroy_hw_idr_uobject+0x20/0x50 [ib_uverbs] uverbs_destroy_uobject+0x37/0x150 [ib_uverbs] __uverbs_cleanup_ufile+0xda/0x140 [ib_uverbs] uverbs_destroy_ufile_hw+0x3a/0xf0 [ib_uverbs] ib_uverbs_remove_one+0xc3/0x140 [ib_uverbs] remove_client_context+0x8b/0xd0 [ib_core] disable_device+0x8c/0x130 [ib_core] __ib_unregister_device+0x10d/0x180 [ib_core] ib_unregister_device+0x21/0x30 [ib_core] __mlx5_ib_remove+0x1e4/0x1f0 [mlx5_ib] auxiliary_bus_remove+0x1e/0x30 device_release_driver_internal+0x103/0x1f0 bus_remove_device+0xf7/0x170 device_del+0x181/0x410 mlx5_rescan_drivers_locked.part.10+0xa9/0x1d0 [mlx5_core] mlx5_disable_lag+0x253/0x260 [mlx5_core] mlx5_lag_disable_change+0x89/0xc0 [mlx5_core] mlx5_eswitch_disable+0x67/0xa0 [mlx5_core] mlx5_unload+0x15/0xd0 [mlx5_core] mlx5_unload_one+0x71/0xc0 [mlx5_core] mlx5_sync_reset_reload_work+0x83/0x100 [mlx5_core] process_one_work+0x1a7/0x360 worker_thread+0x30/0x390 ? create_worker+0x1a0/0x1a0 kthread+0x116/0x130 ? kthread_flush_work_fn+0x10/0x10 ret_from_fork+0x22/0x40
Title RDMA/mlx5: Fix UMR hang in LAG error state unload
First Time appeared Linux
Linux linux Kernel
CPEs cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:*
Vendors & Products Linux
Linux linux Kernel
References

Subscriptions

Linux Linux Kernel
cve-icon MITRE

Status: PUBLISHED

Assigner: Linux

Published:

Updated: 2026-05-27T12:18:32.270Z

Reserved: 2026-05-13T15:03:33.090Z

Link: CVE-2026-45973

cve-icon Vulnrichment

No data.

cve-icon NVD

Status : Awaiting Analysis

Published: 2026-05-27T14:17:14.293

Modified: 2026-05-27T14:48:03.013

Link: CVE-2026-45973

cve-icon Redhat

No data.

cve-icon OpenCVE Enrichment

Updated: 2026-05-27T21:30:34Z

Weaknesses