Summary
PickleScan's ability to scan ZIP archives for malicious pickle files is compromised when the archive contains a file with a bad Cyclic Redundancy Check (CRC). Instead of attempting to scan the files within the archive, whatever the CRC is, PickleScan fails in error and returns no results. This allows attackers to potentially hide malicious pickle payloads within ZIP archives that PyTorch might still be able to load (as PyTorch often disables CRC checks).
Details
PickleScan likely utilizes Python's built-in zipfile module to handle ZIP archives. When zipfile encounters a file within an archive that has a mismatch between the declared CRC and the calculated CRC, it can raise an exception (e.g., BadZipFile or a related error). It appears that PickleScan does not try to scan the files whatever the CRC is.
This behavior contrasts with PyTorch's model loading capabilities, which in many cases might bypass CRC checks for ZIP archives - whatever the configuration is. This discrepancy creates a blind spot where a malicious model packaged in a ZIP with a bad CRC could be loaded by PyTorch while being completely missed by PickleScan.
PoC
- Download an existing Pytorch model with a bad CRC
wget <https://huggingface.co/jinaai/jina-embeddings-v2-base-en/resolve/main/pytorch_model.bin?download=true> -O pytorch_model.bin
- Attempt to scan the corrupted ZIP file with PickleScan:
# Assuming you have PickleScan installed and in your PATH
picklescan -p pytorch_model.bin

Observed Result: PickleScan returns no results and presents an error message indicating a problem with the ZIP file, but it doesn’t attempt to scan any potentially valid pickle files within the archive.
Expected Result: PickleScan should either:
- Attempt to extract and scan other valid files within the ZIP archive, even if some have CRC errors.
- Report a warning indicating that the ZIP archive has CRC errors and might be incomplete or corrupted, but still attempt to scan any accessible content.
Impact
Severity: High
Affected Users: Any organization or individual using PickleScan to analyze PyTorch models or other files distributed as ZIP archives for malicious pickle content.
Impact Details: Attackers can craft malicious PyTorch models containing embedded pickle payloads, package them into ZIP archives, and intentionally introduce CRC errors. This would cause PickleScan to fail to analyze the archive, while PyTorch is still able to load the model (depending on its configuration regarding CRC checks). This creates a significant vulnerability where malicious code can be distributed and potentially executed without detection by PickleScan.
Ex: PickleScan on HuggingFace goes into error (https://huggingface.co/jinaai/jina-embeddings-v2-base-en/tree/main)

Recommendations:
PickleScan should not fail on Bad CRC check, especially if Pytorch is not checking CRC.
Relaxed Zipfile is perfect to fix this issue:
--- picklescan/src/picklescan/relaxed_zipfile.py
+++ picklescan/src/picklescan/relaxed_zipfile.py
@@ class RelaxedZipFile(zipfile.ZipFile):
try:
# Skip the file header:
fheader = zef_file.read(sizeFileHeader)
if len(fheader) != sizeFileHeader:
raise zipfile.BadZipFile("Truncated file header")
fheader = struct.unpack(structFileHeader, fheader)
if fheader[_FH_SIGNATURE] != stringFileHeader:
raise zipfile.BadZipFile("Bad magic number for file header")
zef_file.read(fheader[_FH_FILENAME_LENGTH])
if fheader[_FH_EXTRA_FIELD_LENGTH]:
zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH])
- return zipfile.ZipExtFile(zef_file, mode, zinfo, pwd, True)
+
+ # Create the ZipExtFile and disable CRC check
+ ext_file = zipfile.ZipExtFile(zef_file, mode, zinfo, pwd)
+ # Monkey-patch to skip CRC validation
+ ext_file._expected_crc = None
+ return ext_file
except BaseException:
zef_file.close()
raise
Summary
PickleScan's ability to scan ZIP archives for malicious pickle files is compromised when the archive contains a file with a bad Cyclic Redundancy Check (CRC). Instead of attempting to scan the files within the archive, whatever the CRC is, PickleScan fails in error and returns no results. This allows attackers to potentially hide malicious pickle payloads within ZIP archives that PyTorch might still be able to load (as PyTorch often disables CRC checks).
Details
PickleScan likely utilizes Python's built-in zipfile module to handle ZIP archives. When zipfile encounters a file within an archive that has a mismatch between the declared CRC and the calculated CRC, it can raise an exception (e.g., BadZipFile or a related error). It appears that PickleScan does not try to scan the files whatever the CRC is.
This behavior contrasts with PyTorch's model loading capabilities, which in many cases might bypass CRC checks for ZIP archives - whatever the configuration is. This discrepancy creates a blind spot where a malicious model packaged in a ZIP with a bad CRC could be loaded by PyTorch while being completely missed by PickleScan.
PoC
wget <https://huggingface.co/jinaai/jina-embeddings-v2-base-en/resolve/main/pytorch_model.bin?download=true> -O pytorch_model.bin
Observed Result: PickleScan returns no results and presents an error message indicating a problem with the ZIP file, but it doesn’t attempt to scan any potentially valid pickle files within the archive.
Expected Result: PickleScan should either:
Impact
Severity: High

Affected Users: Any organization or individual using PickleScan to analyze PyTorch models or other files distributed as ZIP archives for malicious pickle content.
Impact Details: Attackers can craft malicious PyTorch models containing embedded pickle payloads, package them into ZIP archives, and intentionally introduce CRC errors. This would cause PickleScan to fail to analyze the archive, while PyTorch is still able to load the model (depending on its configuration regarding CRC checks). This creates a significant vulnerability where malicious code can be distributed and potentially executed without detection by PickleScan.
Ex: PickleScan on HuggingFace goes into error (https://huggingface.co/jinaai/jina-embeddings-v2-base-en/tree/main)
Recommendations:
PickleScan should not fail on Bad CRC check, especially if Pytorch is not checking CRC.
Relaxed Zipfile is perfect to fix this issue: