CVE-2026-44223 - Vulnerability Details

- vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters

Description

vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.

Published: 2026-05-12

Score: 6.5 Medium

EPSS: < 1% Very Low

KEV: No

Impact:

Action:

Analysis

Impact

A bug in the extract_hidden_states speculative decoding proposer causes the function to return a tensor with an incorrect shape after the first decode step, which triggers a RuntimeError that terminates the EngineCore process. This flaw results from a combination of size handling errors and type conversion mistakes, consistent with CWE‑131 (incorrect size handling) and CWE‑704 (incorrect type conversion). The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., repetition_penalty = 1.1) is sufficient to crash the server. This vulnerability is fixed in version 0.20.0. The flaw does not expose any arbitrary code execution or data exfiltration capabilities; it only results in service unavailability for any affected session.

Affected Systems

The vLLM inference engine, maintained by the vllm-project under the product name vllm, is affected. Versions of the software released prior to 0.20.0 are vulnerable; the 0.20.0 release and later versions the fix.

Risk and Exploitability

The CVSS score of 6.5 indicates a moderate impact, and the EPSS score of 0.367% (reported as <1%) indicates a very low likelihood of exploitation. The vulnerability is not listed in the CISA KEV catalog. Attackers can trigger the crash by sending a single request with a penalty parameter, which can be performed remotely against any accessible vLLM instance. The only requirement is the ability to submit a request payload that includes one of the penalty parameters; no additional privileged access is needed.

Default status is the baseline for the product, each version can override it (e.g. patched versions marked unaffected).

Vendor Product Default status Versions

vllm-project

vllm

affected

Version	Status	Constraints
`>= 0.18.0, < 0.20.0`	affected	—

Configuration 1 [-]

cpe:2.3:a:vllm:vllm:*:*:*:*:*:*:*:*

No data.

Vendor Product Confidence Versions

Vllm-project

Vllm

100%

Version	Status	Scheme	Platform
`[0.18.0,0.20.0)`	affected	semver	—

Found an issue or want to improve our Enrichment? You can suggest it directly by opening an issue on our dedicated GitHub repository .

Remediation

No vendor fix or workaround currently provided.

OpenCVE Recommended Actions

Update vLLM to the 0.20.0 release or a later version that contains the patch for the tensor shape bug.
Temporarily disable or filter out the penalty parameters (repetition_penalty, frequency_penalty, presence_penalty) in request payloads until the patch is applied.
Monitor the application logs for RuntimeError messages or sudden process terminations to detect exploitation attempts early.

Generated by OpenCVE AI on June 23, 2026 at 00:21 UTC.

Tracking

Sign in to view the affected projects.

Advisories

Source	ID	Title
Github GHSA	GHSA-83vm-p52w-f9pw	vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters

No CVSS v4.0

Attack Vector Network

Attack Complexity Low

Privileges Required Low

Scope Unchanged

Confidentiality Impact None

Integrity Impact None

Availability Impact High

User Interaction None

No CVSS v3.0

No CVSS v2

This CVE is not in the KEV list.

The EPSS score is 0.00367.

Exploitation poc

Automatable no

Technical Impact partial

References

Link	Providers
https://github.com/vllm-project/vllm/pull/38610
https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw

History

Mon, 22 Jun 2026 22:00:00 +0000

Type	Values Removed	Values Added
Description	vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.	vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.

Fri, 15 May 2026 15:15:00 +0000

Type	Values Removed	Values Added
Metrics		ssvc `{'options': {'Automatable': 'no', 'Exploitation': 'poc', 'Technical Impact': 'partial'}, 'version': '2.0.3'}`

Thu, 14 May 2026 15:45:00 +0000

Type	Values Removed	Values Added
First Time appeared		Vllm Vllm vllm
CPEs		cpe:2.3:a:vllm:vllm::::::::
Vendors & Products		Vllm Vllm vllm

Tue, 12 May 2026 23:30:00 +0000

Type	Values Removed	Values Added
First Time appeared		Vllm-project Vllm-project vllm
Vendors & Products		Vllm-project Vllm-project vllm

Tue, 12 May 2026 20:15:00 +0000

Type	Values Removed	Values Added
Description		vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.
Title		vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters
Weaknesses		CWE-131 CWE-704
References		https://github.com/vllm-project/vllm/pull/38610 https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw
Metrics		cvssV3_1 `{'score': 6.5, 'vector': 'CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H'}`

Subscriptions

Vllm Vllm

Vllm-project Vllm

MITRE

Status: PUBLISHED

Assigner: GitHub_M

Published: 2026-05-12T19:58:40.862Z

Updated: 2026-06-22T21:49:24.277Z

Reserved: 2026-05-05T15:42:40.518Z

Link: CVE-2026-44223

Vulnrichment

Updated: 2026-05-15T14:43:40.735Z

NVD

Status : Modified

Published: 2026-05-12T20:16:43.293

Modified: 2026-06-17T10:50:23.040

Link: CVE-2026-44223

Redhat

No data.

OpenCVE Enrichment

Updated: 2026-06-23T00:30:06Z

Weaknesses

CWE-131
Incorrect Calculation of Buffer Size
CWE-704
Incorrect Type Conversion or Cast

Impact

Affected Systems

Risk and Exploitability

Tracking

Attack Vector Network

Attack Complexity Low

Privileges Required Low

Scope Unchanged

Confidentiality Impact None

Integrity Impact None

Availability Impact High

User Interaction None

Exploitation poc

Automatable no

Technical Impact partial

Subscriptions

JSON object

JSON object

JSON object

JSON object

JSON object