Machine Learning Medium Asked at GoogleAsked at OpenAIAsked at MetaAsked at DeepMind

What is the difference between discriminative and generative models, and when would you prefer each?

For Data Scientist ML Engineer AI / LLM Engineer

The short answer

Discriminative models learn the conditional distribution P(y|x) directly and focus entirely on the decision boundary; generative models learn the joint distribution P(x,y) and can generate new samples. Discriminative models typically achieve higher classification accuracy with sufficient labeled data; generative models excel when data is scarce, you need to synthesize data, or the problem requires modeling the input distribution.

How to think about it

The distinction is what probability distribution the model estimates.

Discriminative models — directly model P(y | x), the probability of the label given input. They learn the boundary between classes without caring about the distribution of inputs.

Examples: logistic regression, SVMs, random forests, gradient boosting, most neural network classifiers.

Generative models — model the joint P(x, y) = P(x | y) P(y), which implies P(y | x) via Bayes’ rule. Because they model P(x | y), they can generate new examples from a given class.

Examples: Naive Bayes, Linear Discriminant Analysis, Gaussian Mixture Models, Hidden Markov Models, VAEs, GANs, diffusion models.

Side-by-side comparison:

Criterion	Discriminative	Generative
Models	P(y given x)	P(x, y)
Data generation	Cannot	Can sample new x
Labeled data efficiency	Needs more labeled data	Can leverage unlabeled data (semi-supervised)
Typical accuracy	Higher (asymptotically)	Lower on pure classification
Missing features	Harder to handle	Natural: integrate out missing dims
Anomaly detection	Indirect	Direct via low P(x)

Practical guidance:

For pure classification with abundant labeled data, use discriminative models (gradient boosting, neural nets).
For data augmentation, image synthesis, or density estimation, use generative models (diffusion, VAE).
For small labeled datasets with large unlabeled pools, semi-supervised generative approaches can close the gap.
Naive Bayes (generative) is surprisingly competitive for text classification with small corpora despite its conditional-independence assumption.

Andrew Ng and Michael Jordan (2002) proved that logistic regression (discriminative) has lower asymptotic error than Naive Bayes (generative), but Naive Bayes converges faster with fewer examples.

What is the difference between discriminative and generative models, and when would you prefer each?

Keep practising

Explore further