With reference to the latest security issue related to tar file - https://www.theregister.com/2022/09/22/python_vulnerability_tarfile/
we are using the Creosote tool - https://github.com/advanced-threat-research/Creosote
to check if there is any vulnerability in the code and in the packages installed in the python virtual environment.
The following is the report generated by the Creosote tool:
:::::::: ::::::::: :::::::::: :::::::: :::::::: :::::::: ::::::::::: ::::::::::
:+: :+: :+: :+: :+: :+: :+: :+: :+: :+: :+: :+: :+:
+:+ +:+ +:+ +:+ +:+ +:+ +:+ +:+ +:+ +:+ +:+
+#+ +#++:++#: +#++:++# +#+ +:+ +#++:++#++ +#+ +:+ +#+ +#++:++#
+#+ +#+ +#+ +#+ +#+ +#+ +#+ +#+ +#+ +#+ +#+
#+# #+# #+# #+# #+# #+# #+# #+# #+# #+# #+# #+# #+#
######## ### ### ########## ######## ######## ######## ### ##########
Starting scan of:venv/
Scanning for Vulnerabilities:
Error reading file:venv/lib/python3.10/site-packages/joblib/test/test_func_inspect_special_encoding.py
'utf-8' codec can't decode byte 0xa4 in position 64: invalid start byte
Scan Completed
4 files with vulns: 0 vulns, 0 probable vulns, and 4 potential vulns found
venv/lib/python3.10/site-packages/pip/_vendor/distlib/util.py
Found potential vulns on lines: 1252
venv/lib/python3.10/site-packages/sklearn/datasets/_lfw.py
Found potential vulns on lines: 111
venv/lib/python3.10/site-packages/sklearn/datasets/_twenty_newsgroups.py
Found potential vulns on lines: 77
venv/lib/python3.10/site-packages/dateutil/zoneinfo/rebuild.py
Found potential vulns on lines: 24
As you can see the report flags out potential vulnerability in the sklearn/datasets sub package. Is there a way to restrict sklearn from downloading it?
Or in general, how to fix this vulnerability to avoid any production issues?
scikit-learn does not download datasets by default. So there are a few options.
Option 0: The exploit looks like it requires administrator privileges. Avoid:
sudo python something.pyOption 1: Don't run this code:
Option 2: I'm not familiar with Creosote, but there are uses of
tarfilein scikit-learn that do not appear to have been flagged. e.g.:fetch_california_housing. If somefetch_methods have potential vulnerabilities, these should be debugged and patched upstream.Option 3: If the existence of this code in the package is considered dangerous for your organization: modify and build wheels that comply with your organization's security policies.