Deep Learning Medium Asked at GoogleAsked at OpenAIAsked at MetaAsked at Hugging Face

What is transfer learning and when should you use full fine-tuning vs feature extraction?

For ML Engineer AI / LLM Engineer Data Scientist

The short answer

Transfer learning reuses weights pretrained on a large dataset as a starting point for a new task. Feature extraction freezes the backbone and trains only a new head; full fine-tuning updates all weights. The right choice depends on dataset size and how similar the new task is to the pretraining domain.

How to think about it

A pretrained network has already learned low-level features (edges, textures in vision; morphology and syntax in NLP). Transfer learning asks: can I reuse that knowledge instead of learning from scratch?

Three regimes

Three transfer learning strategies. Purple = trained weights; grey = frozen weights; amber = adapter weights.

Decision guide

Scenario	Best approach
Small dataset, similar domain	Feature extraction — few weights to train, low overfit risk
Large dataset, different domain	Full fine-tuning — backbone needs to adapt
LLM / large model, any size dataset	LoRA — fine-tuning at a fraction of the compute and memory
Tiny dataset, very different domain	Collect more data; fine-tuning a big model here will overfit

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

# Feature extraction: freeze everything except the classifier head
for name, param in model.named_parameters():
    if "classifier" not in name:
        param.requires_grad = False

For full fine-tuning, use a lower learning rate for the backbone (e.g. 1e-5) than for the head (1e-4) — this is called discriminative learning rates and prevents catastrophic forgetting.

Learn it properly Hugging Face transformers

What is transfer learning and when should you use full fine-tuning vs feature extraction?

Keep practising

Explore further