What is data augmentation in computer vision and which techniques are most effective?
Data augmentation artificially expands the training set by applying label-preserving transformations to existing images, improving generalisation and regularisation without collecting more data. Geometric transforms (flip, crop, rotation) and colour jitter are universally effective; stronger methods like CutMix, MixUp, and RandAugment consistently improve accuracy on top of basic augmentation.
How to think about it
Go beyond listing transforms — explain why each one regularises the model, and distinguish augmentations that are always safe from those that require domain knowledge.
Why augmentation works
A CNN trained on a limited dataset memorises specific image statistics. Augmentation forces the model to learn invariances explicitly: if it sees flipped cats at training time, it will not fail on flipped cats at test time. Augmentation acts as an implicit regulariser, analogous to dropout but in input space.
Standard geometric transforms
- Random horizontal flip — safe for most natural images; avoid for text or asymmetric objects
- Random crop — crop to
(1.0 * size)to(0.75 * size)then resize back; removes location bias - Random rotation (±10–30°) — useful for aerial and medical images; less for upright scenes
- Random scale / resize — forces the model to recognise objects at multiple scales
Colour transforms
- Colour jitter — randomly perturb brightness, contrast, saturation, hue; cheap and consistently helpful
- Grayscale conversion — drop colour with small probability; prevents over-reliance on colour cues
- Gaussian blur — helps models focus on texture over colour artefacts
Advanced augmentation policies
MixUp: blend two images and their labels linearly: x' = λ*x1 + (1-λ)*x2, same for y. Produces smoother decision boundaries.
CutMix: paste a rectangular crop from one image into another; label is proportional to the area. Encourages the model to use features from the full image, not just salient patches.
RandAugment (Cubuk et al., 2019): randomly sample from a large policy of transforms and magnitude. Removes the need to hand-tune augmentation strategies per dataset; state-of-the-art for ImageNet training.
Test-time augmentation (TTA): apply multiple augmentations at inference and average predictions. Free accuracy gain at the cost of inference compute.
When augmentation must match domain
Medical imaging (X-ray, MRI) requires careful choices — vertical flip may be anatomically invalid; extreme colour shifts may destroy diagnostic signal. Always verify that augmentations preserve the semantics the label captures.