Deep Learning Medium
What regularization techniques do you know for deep networks, and how do they prevent overfitting?
The short answer
Common techniques include L1 and L2 weight penalties, dropout (randomly zeroing activations), early stopping, data augmentation, and label smoothing. They reduce overfitting by constraining model capacity or adding noise so the network cannot memorize the training set.
How to think about it
Common techniques include L1 and L2 weight penalties, dropout (randomly zeroing activations), early stopping, data augmentation, and label smoothing. They reduce overfitting by constraining model capacity or adding noise so the network cannot memorize the training set.