Impact
vLLM, an inference engine for large language models, contains a token injection flaw in its multimodal processing from version 0.6.1 up to 0.19.x. When an unauthenticated prompt includes special tokens that map to image or video placeholder sequences that lack actual data, the engine attempts to index into empty grids. This raises an unhandled IndexError, causing the worker to crash or become unavailable, thereby denying service to legitimate users. The flaw is a classic example of CWE‑129, improper validation of array indices.
Affected Systems
Vendors affected are the vllm‑project, specifically the vLLM inference and serving engine. Any deployments using vLLM versions between 0.6.1 inclusive and before 0.20.0 are susceptible. The vulnerability impacts multimodal paths that utilize image_grid_thw or video_grid_thw components.
Risk and Exploitability
The CVSS score of 6.5 indicates a moderate to high severity, while the EPSS score is not available, suggesting no public exploitation data. The flaw is not currently listed in CISA’s KEV catalog. Attackers can exploit the vulnerability remotely by sending a crafted text prompt containing the special token placeholders to an exposed vLLM endpoint. No authentication is required, making any publicly reachable instance a potential target. Successful exploitation would result in worker termination or degraded availability until the service is restarted.
OpenCVE Enrichment
Github GHSA