A sensitive data leakage vulnerability was identified in scikit-learn's TfidfVectorizer, specifically in versions up to and including 1.4.1.post1, which was fixed in version 1.5.0. The vulnerability arises from the unexpected storage of all tokens present in the training data within the `stop_words_` attribute, rather than only storing the subset of tokens required for the TF-IDF technique to function. This behavior leads to the potential leakage of sensitive information, as the `stop_words_` attribute could contain tokens that were meant to be discarded and not stored, such as passwords or keys. The impact of this vulnerability varies based on the nature of the data being processed by the vectorizer.
History

Thu, 24 Oct 2024 20:15:00 +0000

Type Values Removed Values Added
First Time appeared Scikit-learn
Scikit-learn scikit-learn
Weaknesses CWE-922
CPEs cpe:2.3:a:scikit-learn:scikit-learn:*:*:*:*:*:python:*:*
Vendors & Products Scikit-learn
Scikit-learn scikit-learn
Metrics cvssV3_1

{'score': 4.7, 'vector': 'CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:H/I:N/A:N'}


cve-icon MITRE

Status: PUBLISHED

Assigner: @huntr_ai

Published: 2024-06-06T18:28:14.267Z

Updated: 2024-08-01T21:03:11.034Z

Reserved: 2024-05-22T15:52:49.284Z

Link: CVE-2024-5206

cve-icon Vulnrichment

Updated: 2024-08-01T21:03:11.034Z

cve-icon NVD

Status : Analyzed

Published: 2024-06-06T19:16:06.363

Modified: 2024-10-24T19:48:31.637

Link: CVE-2024-5206

cve-icon Redhat

Severity : Moderate

Publid Date: 2024-06-06T00:00:00Z

Links: CVE-2024-5206 - Bugzilla