Description
NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. Prior to 3.10.0-rc1, nltk.data.load() in NLTK is vulnerable to path traversal via URL-encoded path separators and traversal segments when using the nltk: URL scheme. The unsafe-path regex check is performed before url2pathname() decodes the %xx sequences (a classic decode-after-check / TOCTOU-style flaw), allowing an attacker to bypass the protection documented in NLTK's SECURITY.md and read arbitrary files from the filesystem. While literal traversal strings such as ../../../etc/passwd are correctly blocked, encoded variants such as %2fetc%2fpasswd, %2e%2e%2f..., and ..%2f..%2f slip past the regex and are subsequently decoded into a real filesystem path. This vulnerability is fixed in 3.10.0-rc1.
Published: 2026-06-22
Score: 7.5 High
EPSS: n/a
KEV: No
Impact: n/a
Action: n/a
AI Analysis

Impact

NLTK's load function accepts URLs with the 'nltk:' scheme. Prior to version 3.10.0‑rc1, the function checks the input path for traversal characters before decoding percent‑encoded sequences. This decode‑after‑check flaw allows an attacker to provide a URL that resolves, after decoding, to any file path on the local filesystem, such as %2fetc%2fpasswd, bypassing the regex guard and granting read access to arbitrary files. The vulnerability falls under CWE‑22, a classic path traversal weakness that compromises confidentiality.

Affected Systems

The flaw affects the NLTK Python library, with all versions released before 3.10.0‑rc1 vulnerable. Applications that import and use nltk.data.load() with untrusted input are impacted. Upgrades to 3.10.0‑rc1 or later remove the unsafe path validation logic.

Risk and Exploitability

The CVSS score of 7.5 indicates high severity. EPSS is not available, and the issue is not listed in CISA's KEV, suggesting no known widespread exploitation yet. However, the vulnerability can be leveraged in any environment where an application calls nltk.data.load() with attacker‑controlled data, allowing local file reads that could expose sensitive files or internal configuration. The attack requires only the ability to supply the load URL; no additional privileges are needed beyond the application's running context.

Generated by OpenCVE AI on June 22, 2026 at 19:23 UTC.

Remediation

No vendor fix or workaround currently provided.

OpenCVE Recommended Actions

  • Upgrade NLTK to version 3.10.0‑rc1 or later.
  • If upgrading is not immediately possible, ensure that any arguments passed to nltk.data.load() are strictly validated and not derived from user input.
  • As a temporary measure, implement file system access restrictions or monitoring to detect unauthorized file reads.

Generated by OpenCVE AI on June 22, 2026 at 19:23 UTC.

Tracking

Sign in to view the affected projects.

Advisories
Source ID Title
Github GHSA Github GHSA GHSA-p4gq-832x-fm9v Natural Language Toolkit (NLTK): URL-Encoded Path Traversal in nltk.data.load() Allows Arbitrary Local File Read
History

Mon, 22 Jun 2026 18:45:00 +0000

Type Values Removed Values Added
Description NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. Prior to 3.10.0-rc1, nltk.data.load() in NLTK is vulnerable to path traversal via URL-encoded path separators and traversal segments when using the nltk: URL scheme. The unsafe-path regex check is performed before url2pathname() decodes the %xx sequences (a classic decode-after-check / TOCTOU-style flaw), allowing an attacker to bypass the protection documented in NLTK's SECURITY.md and read arbitrary files from the filesystem. While literal traversal strings such as ../../../etc/passwd are correctly blocked, encoded variants such as %2fetc%2fpasswd, %2e%2e%2f..., and ..%2f..%2f slip past the regex and are subsequently decoded into a real filesystem path. This vulnerability is fixed in 3.10.0-rc1.
Title NLTK: URL-Encoded Path Traversal in nltk.data.load() Allows Arbitrary Local File Read
Weaknesses CWE-22
References
Metrics cvssV3_1

{'score': 7.5, 'vector': 'CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N'}


Subscriptions

No data.

cve-icon MITRE

Status: PUBLISHED

Assigner: GitHub_M

Published:

Updated: 2026-06-22T17:25:05.611Z

Reserved: 2026-06-12T17:46:37.293Z

Link: CVE-2026-54293

cve-icon Vulnrichment

No data.

cve-icon NVD

No data.

cve-icon Redhat

No data.

cve-icon OpenCVE Enrichment

Updated: 2026-06-22T19:30:06Z

Weaknesses
  • CWE-22

    Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')