What is the difference between discriminative and generative models, and when would you prefer each?
Discriminative models learn the conditional distribution P(y|x) directly and focus entirely on the decision boundary; generative models learn the joint distribution P(x,y) and can generate new samples. Discriminative models typically achieve higher classification accuracy with sufficient labeled data; generative models excel when data is scarce, you need to synthesize data, or the problem requires modeling the input distribution.
How to think about it
The distinction is what probability distribution the model estimates.
Discriminative models — directly model P(y | x), the probability of the label given input. They learn the boundary between classes without caring about the distribution of inputs.
Examples: logistic regression, SVMs, random forests, gradient boosting, most neural network classifiers.
Generative models — model the joint P(x, y) = P(x | y) P(y), which implies P(y | x) via Bayes’ rule. Because they model P(x | y), they can generate new examples from a given class.
Examples: Naive Bayes, Linear Discriminant Analysis, Gaussian Mixture Models, Hidden Markov Models, VAEs, GANs, diffusion models.
Side-by-side comparison:
| Criterion | Discriminative | Generative |
|---|---|---|
| Models | P(y given x) | P(x, y) |
| Data generation | Cannot | Can sample new x |
| Labeled data efficiency | Needs more labeled data | Can leverage unlabeled data (semi-supervised) |
| Typical accuracy | Higher (asymptotically) | Lower on pure classification |
| Missing features | Harder to handle | Natural: integrate out missing dims |
| Anomaly detection | Indirect | Direct via low P(x) |
Practical guidance:
- For pure classification with abundant labeled data, use discriminative models (gradient boosting, neural nets).
- For data augmentation, image synthesis, or density estimation, use generative models (diffusion, VAE).
- For small labeled datasets with large unlabeled pools, semi-supervised generative approaches can close the gap.
- Naive Bayes (generative) is surprisingly competitive for text classification with small corpora despite its conditional-independence assumption.
Andrew Ng and Michael Jordan (2002) proved that logistic regression (discriminative) has lower asymptotic error than Naive Bayes (generative), but Naive Bayes converges faster with fewer examples.