Description
vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
Published: 2026-06-22
Score: 5.3 Medium
EPSS: n/a
KEV: No
Impact: n/a
Action: n/a
AI Analysis

Impact

vLLM’s GGUF dequantize kernels perform an integer truncation of tensor dimensions, causing the output tensor to be allocated at its full size while the CUDA kernel processes only a truncated number of elements. The leftover portion of the tensor remains uninitialized and may contain data previously residing in GPU memory. In a multi‑tenant inference setup this stale memory can contain tensor data belonging to other users, allowing an attacker to read confidential information. The vulnerability exemplifies a numeric truncation flaw (CWE‑681) coupled with an information disclosure weakness (CWE‑200).

Affected Systems

vLLM, the inference engine for large language models, from version 0.5.5 through 0.23.1rc0 is affected. Versions 0.23.1rc0 and newer incorporate the fix and are no longer impacted.

Risk and Exploitability

The CVSS score of 5.3 denotes moderate severity. No EPSS score is available and the vulnerability is not listed in the CISA KEV catalog. Exploitation requires the ability to submit inference requests that share a GPU with other tenants; the attacker would then benefit from residual GPU memory to read data from other users. The primary consequence is confidentiality loss of tenant data in a shared‑GPU environment.

Generated by OpenCVE AI on June 22, 2026 at 23:23 UTC.

Remediation

No vendor fix or workaround currently provided.

OpenCVE Recommended Actions

  • Update vLLM to version 0.23.1rc0 or later to apply the dequantization kernel fix.
  • Reconfigure the deployment to isolate GPU resources per tenant or restrict multi‑tenant inference when upgrading is not possible.
  • If isolation is infeasible, sanitise GPU memory between inference requests, for example by explicitly clearing output tensors before use.

Generated by OpenCVE AI on June 22, 2026 at 23:23 UTC.

Tracking

Sign in to view the affected projects.

Advisories
Source ID Title
Github GHSA Github GHSA GHSA-5jv2-g5wq-cmr4 vLLM: GGUF dequantize kernel int truncation exposes uninitialized GPU memory in multi-tenant serving
History

Mon, 22 Jun 2026 22:45:00 +0000

Type Values Removed Values Added
Description vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
Title vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions causes GPU buffer overflow
Weaknesses CWE-200
CWE-681
References
Metrics cvssV4_0

{'score': 5.3, 'vector': 'CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:L/VI:L/VA:N/SC:N/SI:N/SA:N'}


Subscriptions

No data.

cve-icon MITRE

Status: PUBLISHED

Assigner: GitHub_M

Published:

Updated: 2026-06-22T21:55:42.001Z

Reserved: 2026-06-11T15:46:12.316Z

Link: CVE-2026-53923

cve-icon Vulnrichment

No data.

cve-icon NVD

No data.

cve-icon Redhat

No data.

cve-icon OpenCVE Enrichment

Updated: 2026-06-22T23:30:05Z

Weaknesses
  • CWE-200

    Exposure of Sensitive Information to an Unauthorized Actor

  • CWE-681

    Incorrect Conversion between Numeric Types