Machine Learning Easy Asked at GoogleAsked at MetaAsked at NetflixAsked at Uber

What are overfitting and underfitting, and how do you fix each?

For Data Scientist ML Engineer AI / LLM Engineer

The short answer

Overfitting occurs when a model memorizes training noise and fails to generalize; underfitting occurs when the model is too simple to capture the true signal. Fixes differ: overfitting requires regularization, more data, or reduced complexity; underfitting requires a more expressive model or better features.

How to think about it

Underfitting — training error is high because the model lacks capacity to represent the target function. A linear model fit to sinusoidal data is the canonical example.

Overfitting — training error is very low but validation/test error is high. The model has captured noise specific to the training set rather than the underlying distribution.

The gap between training and validation loss is the primary diagnostic:

Large train error + large val error → underfit
Low train error + large val error → overfit
Low train error + low val error → good generalization

Underfitting, good fit, and overfitting illustrated on the same dataset

Fixes for overfitting:

Regularization: L1 (Lasso), L2 (Ridge), dropout in neural nets
Early stopping (monitor val loss, stop when it plateaus/rises)
Reduce model complexity (fewer layers, lower polynomial degree)
Get more training data or apply data augmentation
Ensemble methods that average noisy models (bagging)

Fixes for underfitting:

Increase model capacity (deeper network, higher-degree polynomial)
Add informative features / feature engineering
Reduce regularization strength
Train longer / lower learning rate

Learn it properly L1, L2, Elastic Net

What are overfitting and underfitting, and how do you fix each?

Keep practising

Explore further