What is data poisoning, and why is loading a pickle model file dangerous?
Data poisoning is an attack where an adversary injects malicious or mislabeled examples into the training data to bias the model, create backdoors, or degrade it, and it is hard to detect because the model still trains successfully. Loading a pickle model is dangerous because Python's pickle executes arbitrary code on deserialization, so a malicious .pkl or .pt file from an untrusted source can run attacker code the moment you load it. Defenses include trusted data provenance and validation, and using safe formats like safetensors plus scanning model files.
How to think about it
The short answer
Data poisoning is an attack where an adversary injects malicious or mislabeled examples into the training data to bias the model, plant a backdoor, or degrade accuracy. Loading a pickle model is dangerous because Python’s pickle executes arbitrary code during deserialization — a malicious .pkl or .pt file runs the attacker’s code the instant you load it. Both are top items in the OWASP ML / GenAI security risks.
Data poisoning, in depth
Because training still “succeeds,” poisoning is hard to spot. Variants include backdoor/trigger attacks (model behaves normally except on a secret trigger), label flipping, and supply-chain poisoning of public datasets (“split-view” or “frontrunning”). It’s especially dangerous for continual/online learning, where poisoned feedback bends the live model in real time. Defenses: trusted data provenance, input validation and anomaly detection, dataset versioning (so you can audit and roll back), and robust training.
The pickle problem
PyTorch and scikit-learn default to pickle-based serialization. Pickle’s __reduce__ can encode “run this code on load,” so a model downloaded from a public hub can carry an embedded payload that executes on torch.load / pickle.load. Studies have shown malicious pickles smuggled through model hubs.
Defenses:
- Prefer safetensors (Hugging Face’s safe format — data only, no code execution) over pickle.
- Scan model files (e.g., picklescan) and pin/verify checksums.
- Treat third-party models like untrusted code: load in sandboxed, least-privilege environments.
Concrete example
A teammate grabs model.pt off an unknown repo and runs torch.load("model.pt") on a box with cloud credentials. The embedded payload exfiltrates those creds before the model even loads. Using safetensors plus a scan in CI would have prevented it.
Common follow-up / trap
Interviewers ask: “Is safetensors a complete fix?” It removes the code-execution risk of serialization but doesn’t address poisoning of the weights or data — a clean-format model can still be backdoored. The trap is conflating the two threats. Provenance and validation address poisoning; safe formats and scanning address deserialization.