Impact
A bug in the extract_hidden_states speculative decoding proposer causes the function to return a tensor of incorrect shape after the first decode step. This mismatch triggers a RuntimeError that terminates the EngineCore process, leading to a server crash each time a request includes any sampling penalty parameter such as repetition_penalty, frequency_penalty, or presence_penalty. The flaw does not expose any arbitrary code execution or data exfiltration capabilities; it only results in service unavailability for any affected session.
Affected Systems
The vLLM inference engine, maintained by the vllm-project under the product name vllm, is affected. Versions of the software released prior to 0.20.0 are vulnerable; the 0.20.0 release and later versions contain the fix.
Risk and Exploitability
The CVSS score of 6.5 indicates a moderate impact, and the EPSS score is not available, so the likelihood of exploitation is not quantified. The vulnerability is not listed in the CISA KEV catalog. Attackers can trigger the crash by sending a single request with a penalty parameter, which can be performed remotely against any accessible vLLM instance. The only requirement is the ability to submit a request payload that includes one of the penalty parameters; no additional privileged access is needed.
OpenCVE Enrichment
Github GHSA