How do you train a deep learning model when you have very little labelled data?
Small labelled datasets call for a layered strategy: transfer learning from a pretrained backbone, heavy data augmentation, self-supervised pretraining on unlabelled data, and regularisation to prevent the model memorising the few examples it sees.
How to think about it
“Small” is relative to task complexity. A 500-image binary classifier is feasible; a 500-example multilabel medical segmentation task is not. Know which regime you’re in before choosing a strategy.
Strategy stack — apply roughly in order of impact
1. Transfer learning (highest leverage)
Start from a pretrained backbone — ResNet, ViT, BERT, Whisper — and fine-tune only the head. The backbone already encodes rich representations learned from millions of examples. This is the single most effective intervention.
2. Data augmentation
Expand the effective dataset without new labels. For vision: random crops, flips, colour jitter, Mixup, CutMix. For text: synonym replacement, back-translation, easy data augmentation (EDA). For audio: time-stretch, pitch shift, additive noise.
from torchvision import transforms
train_transform = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomResizedCrop(224, scale=(0.7, 1.0)),
transforms.ColorJitter(brightness=0.3, contrast=0.3),
transforms.ToTensor(),
])
3. Self-supervised or semi-supervised learning
If you have unlabelled data in the same domain, pretrain or fine-tune the backbone on it using contrastive learning (SimCLR, MoCo) or masked prediction (MAE, BERT MLM). Even 10 k unlabelled images can meaningfully improve a model trained on 500 labelled ones.
4. Strong regularisation
With few examples every training signal is precious — and every noise signal is dangerous. Use dropout, weight decay, early stopping, and label smoothing together.
5. Reconsider classical ML
If you have structured features and fewer than ~2 k labelled examples, XGBoost or a regularised logistic regression will likely outperform any neural net. Do not fight the data regime — match the model to it.