Impact
NLTK's load function accepts URLs with the 'nltk:' scheme. Prior to version 3.10.0‑rc1, the function checks the input path for traversal characters before decoding percent‑encoded sequences. This decode‑after‑check flaw allows an attacker to provide a URL that resolves, after decoding, to any file path on the local filesystem, such as %2fetc%2fpasswd, bypassing the regex guard and granting read access to arbitrary files. The vulnerability falls under CWE‑22, a classic path traversal weakness that compromises confidentiality.
Affected Systems
The flaw affects the NLTK Python library, with all versions released before 3.10.0‑rc1 vulnerable. Applications that import and use nltk.data.load() with untrusted input are impacted. Upgrades to 3.10.0‑rc1 or later remove the unsafe path validation logic.
Risk and Exploitability
The CVSS score of 7.5 indicates high severity. EPSS is not available, and the issue is not listed in CISA's KEV, suggesting no known widespread exploitation yet. However, the vulnerability can be leveraged in any environment where an application calls nltk.data.load() with attacker‑controlled data, allowing local file reads that could expose sensitive files or internal configuration. The attack requires only the ability to supply the load URL; no additional privileges are needed beyond the application's running context.
OpenCVE Enrichment
Github GHSA