CVE-2026-53923 - Vulnerability Details

- vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions causes GPU buffer overflow

Description

vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.

Published: 2026-06-22

Score: 5.3 Medium

EPSS: < 1% Very Low

KEV: No

Impact:

Action:

Analysis

Impact

vLLM’s GGUF dequantize kernels perform an integer truncation of tensor dimensions, causing the output tensor to be allocated at its full size while the CUDA kernel processes only a truncated number of elements. The leftover portion of the tensor remains uninitialized and may contain data previously residing in GPU memory. In a multi‑tenant inference setup this stale memory can contain tensor data belonging to other users, allowing an attacker to read confidential information. The vulnerability exemplifies a numeric truncation flaw (CWE‑681) coupled with an information disclosure weakness (CWE‑200).

Affected Systems

vLLM, the inference engine for large language models, from version 0.5.5 through 0.23.1rc0 is affected. Versions 0.23.1rc0 and newer incorporate the fix and are no longer impacted.

Risk and Exploitability

The CVSS score of 5.3 denotes moderate severity. The EPSS score is < 1%, indicating a very low but nonzero exploitation probability, and the vulnerability is not listed in the CISA KEV catalog. Exploitation requires the ability to submit inference requests that share a GPU with other tenants; the attacker would then benefit from residual GPU memory to read data from other users. The primary consequence is confidentiality loss of tenant data in a shared‑GPU environment.

Default status is the baseline for the product, each version can override it (e.g. patched versions marked unaffected).

Vendor Product Default status Versions

vllm-project

vllm

affected

Version	Status	Constraints
`>= 0.5.5, < 0.23.1rc0`	affected	—

No data.

Vendor Product Confidence Versions

Vllm-project

Vllm

100%

Version	Status	Scheme	Platform
`[0.5.5,0.23.1rc0)`	affected	generic	—

Found an issue or want to improve our Enrichment? You can suggest it directly by opening an issue on our dedicated GitHub repository .

Remediation

No vendor fix or workaround currently provided.

OpenCVE Recommended Actions

Update vLLM to version 0.23.1rc0 or later to apply the dequantization kernel fix.
Reconfigure the deployment to isolate GPU resources per tenant or restrict multi‑tenant inference when upgrading is not possible.
If isolation is infeasible, sanitise GPU memory between inference requests, for example by explicitly clearing output tensors before use.

Generated by OpenCVE AI on June 29, 2026 at 14:15 UTC.

Tracking

Sign in to view the affected projects.

Advisories

Source	ID	Title
Github GHSA	GHSA-5jv2-g5wq-cmr4	vLLM: GGUF dequantize kernel int truncation exposes uninitialized GPU memory in multi-tenant serving

Attack Vector Network

Attack Complexity Low

Privileges Required None

Attack Requirements None

User Interaction Passive

Vulnerable System Confidentiality Impact Low

Vulnerable System Integrity Impact Low

Vulnerable System Availability Impact None

Subsequent System Confidentiality Impact None

Subsequent System Integrity Impact None

Subsequent System Availability Impact None

Attack Vector Network

Attack Complexity Low

Privileges Required Low

Scope Unchanged

Confidentiality Impact Low

Integrity Impact None

Availability Impact None

User Interaction None

No CVSS v3.0

No CVSS v2

This CVE is not in the KEV list.

The EPSS score is 0.00281.

Exploitation none

Automatable no

Technical Impact partial

References

Link	Providers
https://github.com/vllm-project/vllm/commit/f219788f91952827132fa4fdf916427cd20d225e
https://github.com/vllm-project/vllm/pull/44971
https://github.com/vllm-project/vllm/security/advisories/GHSA-5jv2-g5wq-cmr4
https://nvd.nist.gov/vuln/detail/CVE-2026-53923
https://www.cve.org/CVERecord?id=CVE-2026-53923

History

Mon, 29 Jun 2026 12:15:00 +0000

Type	Values Removed	Values Added
Weaknesses		CWE-824
References		https://nvd.nist.gov/vuln/detail/CVE-2026-53923 https://www.cve.org/CVERecord?id=CVE-2026-53923
Metrics	threat_severity `None`	cvssV3_1 `{'score': 4.3, 'vector': 'CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:L/I:N/A:N'}` threat_severity `Low`

Tue, 23 Jun 2026 15:30:00 +0000

Type	Values Removed	Values Added
Metrics		ssvc `{'options': {'Automatable': 'no', 'Exploitation': 'none', 'Technical Impact': 'partial'}, 'version': '2.0.3'}`

Tue, 23 Jun 2026 01:30:00 +0000

Type	Values Removed	Values Added
First Time appeared		Vllm-project Vllm-project vllm
Vendors & Products		Vllm-project Vllm-project vllm

Mon, 22 Jun 2026 22:45:00 +0000

Type	Values Removed	Values Added
Description		vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
Title		vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions causes GPU buffer overflow
Weaknesses		CWE-200 CWE-681
References		https://github.com/vllm-project/vllm/commit/f219788f91952827132fa4fdf916427cd20d225e https://github.com/vllm-project/vllm/pull/44971 https://github.com/vllm-project/vllm/security/advisories/GHSA-5jv2-g5wq-cmr4
Metrics		cvssV4_0 `{'score': 5.3, 'vector': 'CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:L/VI:L/VA:N/SC:N/SI:N/SA:N'}`

Subscriptions

Vllm-project Vllm

MITRE

Status: PUBLISHED

Assigner: GitHub_M

Published: 2026-06-22T21:55:42.001Z

Updated: 2026-06-23T15:05:21.711Z

Reserved: 2026-06-11T15:46:12.316Z

Link: CVE-2026-53923

Vulnrichment

Updated: 2026-06-23T15:04:19.969Z

NVD

No data.

Redhat

Severity : Low

Publid Date: 2026-06-22T21:55:42Z

Links: CVE-2026-53923 - Bugzilla

OpenCVE Enrichment

Updated: 2026-06-29T14:30:18Z

Weaknesses

CWE-200
Exposure of Sensitive Information to an Unauthorized Actor
CWE-681
Incorrect Conversion between Numeric Types
CWE-824
Access of Uninitialized Pointer

Impact

Affected Systems

Risk and Exploitability

Tracking

Attack Vector Network

Attack Complexity Low

Privileges Required None

Attack Requirements None

User Interaction Passive

Vulnerable System Confidentiality Impact Low

Vulnerable System Integrity Impact Low

Vulnerable System Availability Impact None

Subsequent System Confidentiality Impact None

Subsequent System Integrity Impact None

Subsequent System Availability Impact None

Attack Vector Network

Attack Complexity Low

Privileges Required Low

Scope Unchanged

Confidentiality Impact Low

Integrity Impact None

Availability Impact None

User Interaction None

Exploitation none

Automatable no

Technical Impact partial

Subscriptions

JSON object

JSON object

JSON object

JSON object

JSON object