Impact
The vulnerability stems from the lack of an upper bound on the 'n' parameter in vLLM's OpenAI-compatible API server. An unauthenticated attacker can send a request with an astronomically large 'n' value, causing the server to allocate millions of request objects and exhaust memory before the request reaches the scheduling queue. This leads to an immediate Out‑of‑Memory crash and blocks the Python asyncio event loop, effectively denying service to all clients.
Affected Systems
Affected instances are vLLM deployments using versions from 0.1.0 up through 0.18.x. The issue is resolved in vLLM 0.19.0 and later. The affected product is the vllm-project vllm inference and serving engine for large language models.
Risk and Exploitability
The CVSS score of 6.5 classifies the vulnerability as moderate severity, with a strong impact on availability and no authentication required. The EPSS score is not available, so the exploitation likelihood cannot be quantified, and the vulnerability is not listed in the CISA KEV catalog. Based on the description it is inferred that the attack vector is remote, performed by sending an unauthenticated HTTP request to the API server's completion endpoint.
OpenCVE Enrichment
Github GHSA