Description
vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.
Published: 2026-05-12
Score: 6.5 Medium
EPSS: < 1% Very Low
KEV: No
Impact: n/a
Action: n/a
AI Analysis

Impact

A bug in the extract_hidden_states speculative decoding proposer causes the function to return a tensor of incorrect shape after the first decode step. This mismatch triggers a RuntimeError that terminates the EngineCore process, leading to a server crash each time a request includes any sampling penalty parameter such as repetition_penalty, frequency_penalty, or presence_penalty. The flaw does not expose any arbitrary code execution or data exfiltration capabilities; it only results in service unavailability for any affected session.

Affected Systems

The vLLM inference engine, maintained by the vllm-project under the product name vllm, is affected. Versions of the software released prior to 0.20.0 are vulnerable; the 0.20.0 release and later versions contain the fix.

Risk and Exploitability

The CVSS score of 6.5 indicates a moderate impact, and the EPSS score is not available, so the likelihood of exploitation is not quantified. The vulnerability is not listed in the CISA KEV catalog. Attackers can trigger the crash by sending a single request with a penalty parameter, which can be performed remotely against any accessible vLLM instance. The only requirement is the ability to submit a request payload that includes one of the penalty parameters; no additional privileged access is needed.

Generated by OpenCVE AI on May 12, 2026 at 21:40 UTC.

Remediation

No vendor fix or workaround currently provided.

OpenCVE Recommended Actions

  • Update vLLM to the 0.20.0 release or a later version that contains the patch for the tensor shape bug.
  • Temporarily disable or filter out the penalty parameters (repetition_penalty, frequency_penalty, presence_penalty) in request payloads until the patch is applied.
  • Monitor the application logs for RuntimeError messages or sudden process terminations to detect exploitation attempts early.

Generated by OpenCVE AI on May 12, 2026 at 21:40 UTC.

Tracking

Sign in to view the affected projects.

Advisories
Source ID Title
Github GHSA Github GHSA GHSA-83vm-p52w-f9pw vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters
History

Fri, 15 May 2026 15:15:00 +0000

Type Values Removed Values Added
Metrics ssvc

{'options': {'Automatable': 'no', 'Exploitation': 'poc', 'Technical Impact': 'partial'}, 'version': '2.0.3'}


Thu, 14 May 2026 15:45:00 +0000

Type Values Removed Values Added
First Time appeared Vllm
Vllm vllm
CPEs cpe:2.3:a:vllm:vllm:*:*:*:*:*:*:*:*
Vendors & Products Vllm
Vllm vllm

Tue, 12 May 2026 23:30:00 +0000

Type Values Removed Values Added
First Time appeared Vllm-project
Vllm-project vllm
Vendors & Products Vllm-project
Vllm-project vllm

Tue, 12 May 2026 20:15:00 +0000

Type Values Removed Values Added
Description vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.
Title vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters
Weaknesses CWE-131
CWE-704
References
Metrics cvssV3_1

{'score': 6.5, 'vector': 'CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H'}


cve-icon MITRE

Status: PUBLISHED

Assigner: GitHub_M

Published:

Updated: 2026-05-15T14:46:25.695Z

Reserved: 2026-05-05T15:42:40.518Z

Link: CVE-2026-44223

cve-icon Vulnrichment

Updated: 2026-05-15T14:43:40.735Z

cve-icon NVD

Status : Modified

Published: 2026-05-12T20:16:43.293

Modified: 2026-05-15T15:16:52.560

Link: CVE-2026-44223

cve-icon Redhat

No data.

cve-icon OpenCVE Enrichment

Updated: 2026-05-12T23:15:26Z

Weaknesses