datarekha
Deep Learning Medium Asked at GoogleAsked at OpenAIAsked at MetaAsked at Hugging Face

What is transfer learning and when should you use full fine-tuning vs feature extraction?

The short answer

Transfer learning reuses weights pretrained on a large dataset as a starting point for a new task. Feature extraction freezes the backbone and trains only a new head; full fine-tuning updates all weights. The right choice depends on dataset size and how similar the new task is to the pretraining domain.

How to think about it

A pretrained network has already learned low-level features (edges, textures in vision; morphology and syntax in NLP). Transfer learning asks: can I reuse that knowledge instead of learning from scratch?

Three regimes

Feature ExtractionBackbone (frozen)New head (trained)Full Fine-tuningBackbone (trained)New head (trained)LoRABackbone (frozen)Low-rank adapters
Three transfer learning strategies. Purple = trained weights; grey = frozen weights; amber = adapter weights.

Decision guide

ScenarioBest approach
Small dataset, similar domainFeature extraction — few weights to train, low overfit risk
Large dataset, different domainFull fine-tuning — backbone needs to adapt
LLM / large model, any size datasetLoRA — fine-tuning at a fraction of the compute and memory
Tiny dataset, very different domainCollect more data; fine-tuning a big model here will overfit
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

# Feature extraction: freeze everything except the classifier head
for name, param in model.named_parameters():
    if "classifier" not in name:
        param.requires_grad = False

For full fine-tuning, use a lower learning rate for the backbone (e.g. 1e-5) than for the head (1e-4) — this is called discriminative learning rates and prevents catastrophic forgetting.

Learn it properly Hugging Face transformers

Keep practising

All Deep Learning questions

Explore further

Skip to content