Impact
A bug in the extract_hidden_states speculative decoding proposer causes the function to return a tensor with an incorrect shape after the first decode step, which triggers a RuntimeError that terminates the EngineCore process. This flaw results from a combination of size handling errors and type conversion mistakes, consistent with CWE‑131 (incorrect size handling) and CWE‑704 (incorrect type conversion). The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., repetition_penalty = 1.1) is sufficient to crash the server. This vulnerability is fixed in version 0.20.0. The flaw does not expose any arbitrary code execution or data exfiltration capabilities; it only results in service unavailability for any affected session.
Affected Systems
The vLLM inference engine, maintained by the vllm-project under the product name vllm, is affected. Versions of the software released prior to 0.20.0 are vulnerable; the 0.20.0 release and later versions the fix.
Risk and Exploitability
The CVSS score of 6.5 indicates a moderate impact, and the EPSS score of 0.367% (reported as <1%) indicates a very low likelihood of exploitation. The vulnerability is not listed in the CISA KEV catalog. Attackers can trigger the crash by sending a single request with a penalty parameter, which can be performed remotely against any accessible vLLM instance. The only requirement is the ability to submit a request payload that includes one of the penalty parameters; no additional privileged access is needed.
OpenCVE Enrichment
Github GHSA