CVE-2024-5206 - Vulnerability Details

- Sensitive Data Leakage in sklearn.feature_extraction.text.TfidfVectorizer in scikit-learn/scikit-learn

Description

A sensitive data leakage vulnerability was identified in scikit-learn's TfidfVectorizer, specifically in versions up to and including 1.4.1.post1, which was fixed in version 1.5.0. The vulnerability arises from the unexpected storage of all tokens present in the training data within the `stop_words_` attribute, rather than only storing the subset of tokens required for the TF-IDF technique to function. This behavior leads to the potential leakage of sensitive information, as the `stop_words_` attribute could contain tokens that were meant to be discarded and not stored, such as passwords or keys. The impact of this vulnerability varies based on the nature of the data being processed by the vectorizer.

Published: 2024-06-06

Score: 4.7 Medium

EPSS: < 1% Very Low

KEV: No

Impact:

Action:

Analysis

No analysis available yet.

Default status is the baseline for the product, each version can override it (e.g. patched versions marked unaffected).

Vendor Product Default status Versions

scikit-learn

scikit-learn/scikit-learn

affected

Version	Status	Constraints
`unspecified`	affected	< 1.5.0

Configuration 1 [-]

cpe:2.3:a:scikit-learn:scikit-learn:*:*:*:*:*:python:*:*

No data.

No data available yet.

Remediation

No remediation available yet.

Tracking

Sign in to view the affected projects.

Advisories

Source	ID	Title
EUVD	EUVD-2024-0161	A sensitive data leakage vulnerability was identified in scikit-learn's TfidfVectorizer, specifically in versions up to and including 1.4.1.post1, which was fixed in version 1.5.0. The vulnerability arises from the unexpected storage of all tokens present in the training data within the `stop_words_` attribute, rather than only storing the subset of tokens required for the TF-IDF technique to function. This behavior leads to the potential leakage of sensitive information, as the `stop_words_` attribute could contain tokens that were meant to be discarded and not stored, such as passwords or keys. The impact of this vulnerability varies based on the nature of the data being processed by the vectorizer.
Github GHSA	GHSA-jw8x-6495-233v	scikit-learn sensitive data leakage vulnerability

No CVSS v4.0

Attack Vector Local

Attack Complexity High

Privileges Required Low

Scope Unchanged

Confidentiality Impact High

Integrity Impact None

Availability Impact None

User Interaction None

Attack Vector Local

Attack Complexity High

Privileges Required Low

Scope Unchanged

Confidentiality Impact High

Integrity Impact None

Availability Impact None

User Interaction None

No CVSS v2

This CVE is not in the KEV list.

The EPSS score is 0.00187.

Key SSVC decision points have not yet been added.

References

Link	Providers
https://github.com/scikit-learn/scikit-learn/commit/70ca21f106b603b611da73012c9ade7cd8e438b8
https://huntr.com/bounties/14bc0917-a85b-4106-a170-d09d5191517c
https://nvd.nist.gov/vuln/detail/CVE-2024-5206
https://www.cve.org/CVERecord?id=CVE-2024-5206

History

Tue, 15 Jul 2025 13:45:00 +0000

Type	Values Removed	Values Added
Metrics	epss `{'score': 0.00029}`	epss `{'score': 0.00032}`

Thu, 24 Oct 2024 20:15:00 +0000

Type	Values Removed	Values Added
First Time appeared		Scikit-learn Scikit-learn scikit-learn
Weaknesses		CWE-922
CPEs		cpe:2.3:a:scikit-learn:scikit-learn::::::python::*
Vendors & Products		Scikit-learn Scikit-learn scikit-learn
Metrics		cvssV3_1 `{'score': 4.7, 'vector': 'CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:H/I:N/A:N'}`

Subscriptions

Scikit-learn Scikit-learn

MITRE

Status: PUBLISHED

Assigner: @huntr_ai

Published: 2024-06-06T18:28:14.267Z

Updated: 2024-08-01T21:03:11.034Z

Reserved: 2024-05-22T15:52:49.284Z

Link: CVE-2024-5206

Vulnrichment

Updated: 2024-08-01T21:03:11.034Z

NVD

Status : Modified

Published: 2024-06-06T19:16:06.363

Modified: 2026-06-17T08:15:24.690

Link: CVE-2024-5206

Redhat

Severity : Moderate

Publid Date: 2024-06-06T00:00:00Z

Links: CVE-2024-5206 - Bugzilla

OpenCVE Enrichment

No data.

Weaknesses

CWE-921
Storage of Sensitive Data in a Mechanism without Access Control
CWE-922
Insecure Storage of Sensitive Information

Tracking

Attack Vector Local

Attack Complexity High

Privileges Required Low

Scope Unchanged

Confidentiality Impact High

Integrity Impact None

Availability Impact None

User Interaction None

Attack Vector Local

Attack Complexity High

Privileges Required Low

Scope Unchanged

Confidentiality Impact High

Integrity Impact None

Availability Impact None

User Interaction None

Subscriptions

JSON object

JSON object

JSON object

JSON object

JSON object