datarekha

What is the vanishing gradient problem, and how do you address it?

The short answer

Vanishing gradients occur when gradients shrink toward zero as they propagate back through many layers, so early layers learn extremely slowly or not at all; it is common with sigmoid or tanh activations in deep networks. Mitigations include ReLU-family activations, residual/skip connections, batch or layer normalization, careful initialization, and gated architectures like LSTMs.

How to think about it

Vanishing gradients occur when gradients shrink toward zero as they propagate back through many layers, so early layers learn extremely slowly or not at all; it is common with sigmoid or tanh activations in deep networks. Mitigations include ReLU-family activations, residual/skip connections, batch or layer normalization, careful initialization, and gated architectures like LSTMs.

Learn it properly Vanishing & exploding gradients

Keep practising

All Deep Learning questions

Explore further

Skip to content