datarekha

What is gradient clipping and when would you use it?

The short answer

Gradient clipping caps the magnitude of gradients (by value or by global norm) before the optimizer step, preventing exploding gradients that cause unstable or diverging training. It is especially useful in RNNs and transformers, where a single large update can destabilize learning.

How to think about it

Gradient clipping caps the magnitude of gradients (by value or by global norm) before the optimizer step, preventing exploding gradients that cause unstable or diverging training. It is especially useful in RNNs and transformers, where a single large update can destabilize learning.

Learn it properly Vanishing & exploding gradients

Keep practising

All Deep Learning questions

Explore further

Skip to content