Impact
An unauthenticated attacker can send a single HTTP request containing an extremely large n parameter to the OpenAI‑compatible API server of vLLM. Because the service does not bound the n value, the server allocates millions of request objects before they even reach the scheduling queue, exhausting heap memory and forcing the Python asyncio event loop to block. This results in an immediate Out‑of‑Memory crash that takes the process down and denies service to legitimate users. The weakness matches CWE‑770 for uncontrolled resource consumption.
Affected Systems
The vulnerability affects the vLLM inference and serving engine released by the vllm‑project organization. Any deployment based on version 0.1.0 up through 0.18.x (i.e., before 0.19.0) is susceptible when the API server is reachable from untrusted clients. Versions 0.19.0 and newer contain the fix.
Risk and Exploitability
The CVSS score of 6.5 indicates a moderate overall risk. No publicly documented exploits are currently known, but the flaw can be triggered with a single unauthenticated HTTP request over the public API interface. An attacker does not need special credentials and can disrupt service availability for affected systems. The impact is a sudden loss of availability caused by an Out‑of‑Memory crash.
OpenCVE Enrichment
Github GHSA