datarekha

What distinguishes QLoRA from LoRA?

The short answer

QLoRA quantizes the frozen base model to 4-bit using NF4 and double quantization, then trains LoRA adapters on top, with paged optimizers to handle memory spikes. This dramatically lowers the memory footprint, enabling fine-tuning of large models on a single consumer GPU with little quality loss versus 16-bit LoRA.

How to think about it

QLoRA quantizes the frozen base model to 4-bit using NF4 and double quantization, then trains LoRA adapters on top, with paged optimizers to handle memory spikes. This dramatically lowers the memory footprint, enabling fine-tuning of large models on a single consumer GPU with little quality loss versus 16-bit LoRA.

Learn it properly LoRA & QLoRA fine-tuning

Keep practising

All NLP & LLMs questions

Explore further

Skip to content