Description
OOM Denial of Service via Unbounded Array Allocation in Apache OpenNLP AbstractModelReader 

Versions Affected: 

before 2.5.9

before 3.0.0-M3 

Description:


The AbstractModelReader methods getOutcomes(), getOutcomePatterns(), and getPredicates() each read a 32-bit signed integer count field from a binary model stream and pass that value directly to an array allocation (new String[numOutcomes], new int[numOCTypes][], new String[NUM_PREDS]) without validating that the value is non-negative or within a reasonable bound. The count is therefore fully attacker-controlled when the model file originates from an untrusted source.


A crafted .bin model file in which any of these count fields is set to Integer.MAX_VALUE (or any value large enough to exhaust the available heap) triggers an OutOfMemoryError at the array allocation itself, before the corresponding label or pattern data is consumed from the stream. The error occurs very early in deserialization: for a GIS model, getOutcomes() is reached after only the model-type string, the correction constant, and the correction parameter have been read; so the attacker pays no meaningful size cost to weaponize a payload, and a single small file can crash a JVM that loads it. Any code path that deserializes a .bin model is affected, including direct use of GenericModelReader and any higher-level component that delegates to it during model load.


The practical impact is denial of service against processes that load model files from untrusted or semi-trusted origins.  


Mitigation:



* 2.x users should upgrade to 2.5.9.

* 3.x users should upgrade to 3.0.0-M3.




Note: The fix introduces an upper bound on each of the three count fields, checked before array allocation; counts that are negative or exceed the bound cause an IllegalArgumentException to be thrown and the read to fail fast with no large allocation. The default bound is 10,000,000, which is well above the entry counts of legitimate OpenNLP models but far below any value that would threaten heap exhaustion. Deployments that legitimately need to load models with more entries than the default can raise the limit at JVM startup by setting the OPENNLP_MAX_ENTRIES system property to the desired positive integer (e.g. -DOPENNLP_MAX_ENTRIES=50000000); invalid or non-positive values fall back to the default.


Users who cannot upgrade immediately should treat all .bin model files as untrusted input unless their provenance is verified, and should avoid loading models supplied by end users or fetched from third-party repositories without integrity checks.
Published: 2026-05-04
Score: n/a
EPSS: n/a
KEV: No
Impact: n/a
Action: n/a
AI Analysis

Impact

The flaw arises because AbstractModelReader reads a signed integer from a binary model file and passes it directly to an array allocation without checking for negative or excessively large values. An attacker can craft a model file that sets the count fields to Integer.MAX_VALUE or another large value, causing an OutOfMemoryError during the early stages of deserialization. The vulnerability is an example of unchecked resource consumption, classified as CWE-789, and leads to a denial-of-service condition when a vulnerable JVM attempts to load the malicious model.

Affected Systems

The vulnerability affects Apache OpenNLP prior to versions 2.5.9 and 3.0.0-M3. Any component that loads a .bin model file—such as GenericModelReader or higher‑level utilities— is impacted if it processes models from untrusted or semi‑trusted origins.

Risk and Exploitability

The risk is high in environments where user or third‑party supplied model files are accepted, because an attacker can trigger a crash with a single, lightweight file. No CVSS score is publicly provided, but the potential for automated exploitation is significant. The EPSS score is not available, and the vulnerability is not listed in CISA’s KEV catalog, though the lack of these metrics does not reduce the likelihood of a successful denial‑of‑service attack. The attack vector is local or remote, depending on whether the application accepts model files over a network or from untrusted users.

Generated by OpenCVE AI on May 4, 2026 at 19:08 UTC.

Remediation

No vendor fix or workaround currently provided.

OpenCVE Recommended Actions

  • Upgrade to Apache OpenNLP 2.5.9 or newer 3.0.0-M3 so that count values are bounded before array allocation.
  • Do not load .bin model files that originate from untrusted sources; verify the provenance or integrity of the file before deserialization.
  • If larger model entry counts are required, set the OPENNLP_MAX_ENTRIES JVM property to a safe value (e.g., -DOPENNLP_MAX_ENTRIES=50000000) while ensuring the value is positive and within system limits.

Generated by OpenCVE AI on May 4, 2026 at 19:08 UTC.

Tracking

Sign in to view the affected projects.

Advisories

No advisories yet.

History

Mon, 04 May 2026 19:30:00 +0000

Type Values Removed Values Added
First Time appeared Apache
Apache opennlp
Vendors & Products Apache
Apache opennlp

Mon, 04 May 2026 18:30:00 +0000

Type Values Removed Values Added
References

Mon, 04 May 2026 17:15:00 +0000

Type Values Removed Values Added
Description OOM Denial of Service via Unbounded Array Allocation in Apache OpenNLP AbstractModelReader  Versions Affected:  before 2.5.9 before 3.0.0-M3  Description: The AbstractModelReader methods getOutcomes(), getOutcomePatterns(), and getPredicates() each read a 32-bit signed integer count field from a binary model stream and pass that value directly to an array allocation (new String[numOutcomes], new int[numOCTypes][], new String[NUM_PREDS]) without validating that the value is non-negative or within a reasonable bound. The count is therefore fully attacker-controlled when the model file originates from an untrusted source. A crafted .bin model file in which any of these count fields is set to Integer.MAX_VALUE (or any value large enough to exhaust the available heap) triggers an OutOfMemoryError at the array allocation itself, before the corresponding label or pattern data is consumed from the stream. The error occurs very early in deserialization: for a GIS model, getOutcomes() is reached after only the model-type string, the correction constant, and the correction parameter have been read; so the attacker pays no meaningful size cost to weaponize a payload, and a single small file can crash a JVM that loads it. Any code path that deserializes a .bin model is affected, including direct use of GenericModelReader and any higher-level component that delegates to it during model load. The practical impact is denial of service against processes that load model files from untrusted or semi-trusted origins.   Mitigation: * 2.x users should upgrade to 2.5.9. * 3.x users should upgrade to 3.0.0-M3. Note: The fix introduces an upper bound on each of the three count fields, checked before array allocation; counts that are negative or exceed the bound cause an IllegalArgumentException to be thrown and the read to fail fast with no large allocation. The default bound is 10,000,000, which is well above the entry counts of legitimate OpenNLP models but far below any value that would threaten heap exhaustion. Deployments that legitimately need to load models with more entries than the default can raise the limit at JVM startup by setting the OPENNLP_MAX_ENTRIES system property to the desired positive integer (e.g. -DOPENNLP_MAX_ENTRIES=50000000); invalid or non-positive values fall back to the default. Users who cannot upgrade immediately should treat all .bin model files as untrusted input unless their provenance is verified, and should avoid loading models supplied by end users or fetched from third-party repositories without integrity checks.
Title Apache OpenNLP: OOM DoS via Unbounded Array Allocation in AbstractModelReader
Weaknesses CWE-789
References

cve-icon MITRE

Status: PUBLISHED

Assigner: apache

Published:

Updated: 2026-05-04T17:37:00.275Z

Reserved: 2026-04-27T12:43:14.347Z

Link: CVE-2026-42440

cve-icon Vulnrichment

No data.

cve-icon NVD

Status : Received

Published: 2026-05-04T17:16:26.147

Modified: 2026-05-04T18:16:32.123

Link: CVE-2026-42440

cve-icon Redhat

No data.

cve-icon OpenCVE Enrichment

Updated: 2026-05-04T19:15:06Z

Weaknesses