Description
XML External Entity (XXE) via Unsanitized Dictionary Parsing in Apache OpenNLP DictionaryEntryPersistor


Versions Affected: before 2.5.9, before 3.0.0-M3


Description: The DictionaryEntryPersistor class initializes a static SAXParserFactory at class-load time without enabling FEATURE_SECURE_PROCESSING or disabling DTD processing. When create(InputStream, EntryInserter) is invoked, the only feature set on the XMLReader is namespace support — external entity resolution and DOCTYPE declarations remain fully enabled. An attacker who can supply a crafted dictionary file (e.g., a stop-word list or domain dictionary) containing a malicious DOCTYPE declaration can trigger local file disclosure via file:// entity references or server-side request forgery via http:// entity references during SAX parsing, before the application processes a single dictionary entry. This is inconsistent with the project's own XmlUtil.createSaxParser() helper, which correctly sets FEATURE_SECURE_PROCESSING and disallow-doctype-decl and is used by all other XML parsing paths in the codebase. The public Dictionary(InputStream) constructor delegates directly to this method and is the documented API for loading user-supplied dictionaries, making untrusted input a realistic scenario.


Mitigation: 2.x users should upgrade to 2.5.9. 3.x users should upgrade to 3.0.0-M3. Users who cannot upgrade immediately should ensure that all dictionary files are sourced from trusted origins and should consider wrapping the Dictionary(InputStream) constructor with input validation that rejects any XML containing a DOCTYPE declaration before it reaches the parser.
Published: 2026-05-04
Score: n/a
EPSS: n/a
KEV: No
Impact: n/a
Action: n/a
AI Analysis

Impact

Apache OpenNLP’s DictionaryEntryPersistor initializes a SAXParserFactory that allows external entity resolution and DOCTYPE declarations, creating an XXE vulnerability (CWE-611). When the public Dictionary(InputStream) constructor processes a user‑supplied dictionary file, an attacker can craft a malicious DOCTYPE to read local files via file:// references or trigger server‑side request forgery using http:// references. This leads to disclosure of sensitive files or internal resources before any dictionary entry is processed. The impact is a compromise of confidentiality and potential lateral movement through accessed internal hosts.

Affected Systems

The vulnerability affects Apache OpenNLP versions prior to 2.5.9 and prior to 3.0.0-M3. The affected vendor is the Apache Software Foundation for its OpenNLP library. Users running these older releases and loading dictionaries through the public API are at risk.

Risk and Exploitability

No EPSS score is available and the vulnerability is not listed in the CISA KEV catalog, but the lack of a secure parser makes the flaw exploitable when a trusted dictionary source is not enforced. Exploitation requires the attacker to supply a crafted dictionary file, which is realistic for deployments that load user‑supplied dictionaries. The CVSS score is not provided in the data, but the combination of XXE with the ability to access local or remote resources indicates a high likelihood of adverse impact if the vulnerability is leveraged.

Generated by OpenCVE AI on May 4, 2026 at 18:55 UTC.

Remediation

No vendor fix or workaround currently provided.

OpenCVE Recommended Actions

  • Upgrade Apache OpenNLP to release 2.5.9 or later (for 2.x) or to 3.0.0‑M3 or later (for 3.x).
  • Validate that all dictionary files come from trusted, authenticated sources before they reach the parsing component; consider refusing to load files from unknown or external locations.
  • Implement a pre‑validation wrapper around the Dictionary(InputStream) constructor that rejects any XML containing a DOCTYPE declaration, preventing the parser from processing external entities.

Generated by OpenCVE AI on May 4, 2026 at 18:55 UTC.

Tracking

Sign in to view the affected projects.

Advisories

No advisories yet.

History

Mon, 04 May 2026 19:15:00 +0000

Type Values Removed Values Added
First Time appeared Apache
Apache opennlp
Vendors & Products Apache
Apache opennlp

Mon, 04 May 2026 18:30:00 +0000

Type Values Removed Values Added
References

Mon, 04 May 2026 17:15:00 +0000

Type Values Removed Values Added
Description XML External Entity (XXE) via Unsanitized Dictionary Parsing in Apache OpenNLP DictionaryEntryPersistor Versions Affected: before 2.5.9, before 3.0.0-M3 Description: The DictionaryEntryPersistor class initializes a static SAXParserFactory at class-load time without enabling FEATURE_SECURE_PROCESSING or disabling DTD processing. When create(InputStream, EntryInserter) is invoked, the only feature set on the XMLReader is namespace support — external entity resolution and DOCTYPE declarations remain fully enabled. An attacker who can supply a crafted dictionary file (e.g., a stop-word list or domain dictionary) containing a malicious DOCTYPE declaration can trigger local file disclosure via file:// entity references or server-side request forgery via http:// entity references during SAX parsing, before the application processes a single dictionary entry. This is inconsistent with the project's own XmlUtil.createSaxParser() helper, which correctly sets FEATURE_SECURE_PROCESSING and disallow-doctype-decl and is used by all other XML parsing paths in the codebase. The public Dictionary(InputStream) constructor delegates directly to this method and is the documented API for loading user-supplied dictionaries, making untrusted input a realistic scenario. Mitigation: 2.x users should upgrade to 2.5.9. 3.x users should upgrade to 3.0.0-M3. Users who cannot upgrade immediately should ensure that all dictionary files are sourced from trusted origins and should consider wrapping the Dictionary(InputStream) constructor with input validation that rejects any XML containing a DOCTYPE declaration before it reaches the parser.
Title Apache OpenNLP: XXE via Dictionary Parsing in DictionaryEntryPersistor
Weaknesses CWE-611
References

cve-icon MITRE

Status: PUBLISHED

Assigner: apache

Published:

Updated: 2026-05-04T17:36:52.681Z

Reserved: 2026-04-14T17:21:09.189Z

Link: CVE-2026-40682

cve-icon Vulnrichment

No data.

cve-icon NVD

Status : Received

Published: 2026-05-04T17:16:23.657

Modified: 2026-05-04T18:16:29.337

Link: CVE-2026-40682

cve-icon Redhat

No data.

cve-icon OpenCVE Enrichment

Updated: 2026-05-04T19:00:07Z

Weaknesses