Impact
The CVE involves a discrepancy in the default algorithm used for mono downmixing in the Librosa library, which vLLM relies upon. This mismatch between the standard ITU-R BS.775‑4 weighted downmix and the simpler numpy.mean approach results in audio that humans hear differing from audio supplied to AI models. Such inconsistency can cause the model to process audio content differently than expected, potentially leading to incorrect or manipulated inference outcomes.
Affected Systems
vLLM, the open‑source inference engine for large language models, is affected in all releases from version 0.5.5 up to, but not including, 0.18.0. Users running those versions are susceptible to the discussed audio downmixing behavior.
Risk and Exploitability
The vulnerability carries a CVSS score of 5.9, placing it in the medium severity range, and an EPSS score of less than 1 %, indicating low probability of exploitation in the wild. It is not listed in the CISA KEV catalog. The likely attack vector involves an adversary supplying specially crafted audio input that exploits the differing downmix algorithm, thereby influencing the model’s output. No official workaround is provided; the issue is fixed in v0.18.0.
OpenCVE Enrichment