What is LoRA and how does it make fine-tuning parameter-efficient?

For AI / LLM Engineer ML Engineer research-engineer

The short answer

LoRA freezes the pretrained weights and injects small trainable low-rank matrices into selected layers, learning the weight update as their low-rank product. This trains a tiny fraction of parameters, slashing memory and storage while approximating full fine-tuning, and the adapters can be merged back at inference.

How to think about it

Learn it properly LoRA & QLoRA fine-tuning

Keep practising

How does LoRA work and why is it preferred over full fine-tuning for large models? What distinguishes QLoRA from LoRA? What is catastrophic forgetting and how does parameter-efficient fine-tuning help avoid it? What is the difference between pretraining, fine-tuning, instruction-tuning, and RLHF? What's the difference between full retraining, incremental (warm-start) training, and continual online learning?

All NLP & LLMs questions

Explore further

Fine-tuning: LoRA & QLoRA Fine-tune vs RAG: the decision Scaling laws

LoRA Fine-Tuning Optimizer L1 Regularization (Lasso)