NLP & LLMs Medium
What is LoRA and how does it make fine-tuning parameter-efficient?
The short answer
LoRA freezes the pretrained weights and injects small trainable low-rank matrices into selected layers, learning the weight update as their low-rank product. This trains a tiny fraction of parameters, slashing memory and storage while approximating full fine-tuning, and the adapters can be merged back at inference.
How to think about it
LoRA freezes the pretrained weights and injects small trainable low-rank matrices into selected layers, learning the weight update as their low-rank product. This trains a tiny fraction of parameters, slashing memory and storage while approximating full fine-tuning, and the adapters can be merged back at inference.