NLP & LLMs Hard
What distinguishes QLoRA from LoRA?
The short answer
QLoRA quantizes the frozen base model to 4-bit using NF4 and double quantization, then trains LoRA adapters on top, with paged optimizers to handle memory spikes. This dramatically lowers the memory footprint, enabling fine-tuning of large models on a single consumer GPU with little quality loss versus 16-bit LoRA.
How to think about it
QLoRA quantizes the frozen base model to 4-bit using NF4 and double quantization, then trains LoRA adapters on top, with paged optimizers to handle memory spikes. This dramatically lowers the memory footprint, enabling fine-tuning of large models on a single consumer GPU with little quality loss versus 16-bit LoRA.