What is catastrophic forgetting and how does parameter-efficient fine-tuning help avoid it?

For AI / LLM Engineer ML Engineer research-engineer

The short answer

Catastrophic forgetting is when fine-tuning on a new task overwrites weights and erases previously learned capabilities. Parameter-efficient methods like LoRA freeze the base weights and train only small added parameters, preserving the original knowledge while adapting behavior, and techniques like lower learning rates, replay data, and adapter isolation further reduce forgetting.

How to think about it

Catastrophic forgetting is when fine-tuning on a new task overwrites weights and erases previously learned capabilities. Parameter-efficient methods like LoRA freeze the base weights and train only small added parameters, preserving the original knowledge while adapting behavior, and techniques like lower learning rates, replay data, and adapter isolation further reduce forgetting.

Learn it properly LoRA & QLoRA fine-tuning

Keep practising

What is LoRA and how does it make fine-tuning parameter-efficient? How does LoRA work and why is it preferred over full fine-tuning for large models? What's the difference between full retraining, incremental (warm-start) training, and continual online learning? What is the dying ReLU problem and how do you prevent it? What is early stopping, and how does it prevent overfitting?

All NLP & LLMs questions

Explore further

Fine-tuning: LoRA & QLoRA Fine-tune vs RAG: the decision Retraining & continual learning

LoRA Fine-Tuning Overfitting Regularization