What is the difference between shadow deployment and canary deployment for ML models, and when do you use each?
Shadow deployment mirrors live traffic to the new model and discards its predictions, so you can evaluate performance and load without any user impact. Canary deployment routes a small real slice of traffic to the new model and uses its predictions, so real user impact is possible but limited and monitored.
How to think about it
Shadow deployment duplicates every incoming request to both the current champion model and the new challenger. The challenger’s response is logged but never returned to the user. This lets you measure prediction distributions, latency, error rates, and resource consumption under real production load with zero risk to users. It is the standard step before a canary when the new model is a significant change — different architecture, different feature set, or first deployment of any model.
Canary deployment routes a small percentage of real traffic (typically 1–5 %) to the new model and serves its predictions to actual users. The remaining traffic continues to the champion. Business metrics (click-through, conversion, revenue) and technical metrics (p99 latency, error rate) are monitored on both slices. If the canary slice degrades, traffic is instantly shifted back to 0 % without a code change.
Typical promotion path: Shadow → Canary (1 %) → Canary (10 %) → Full rollout.
# Argo Rollouts canary strategy
strategy:
canary:
steps:
- setWeight: 5
- pause: {duration: 30m}
- setWeight: 25
- pause: {duration: 1h}
- setWeight: 100
analysis:
templates:
- templateName: model-error-rate
args:
- name: threshold
value: "0.01"