What is the difference between supervised, unsupervised, and reinforcement learning?
Supervised learning trains on labeled input-output pairs to predict a target. Unsupervised learning finds structure in unlabeled data. Reinforcement learning trains an agent to maximize cumulative reward through trial-and-error interaction with an environment.
How to think about it
The three paradigms differ on what signal drives learning.
Supervised learning — every training example carries a ground-truth label y. The model learns a mapping f(x) → y by minimizing a loss between predictions and labels. Examples: spam detection (label = spam/not-spam), house-price regression (label = sale price).
Unsupervised learning — no labels exist. The algorithm discovers latent structure: clusters (k-means), lower-dimensional representations (PCA, autoencoders), or density estimates. Example: segmenting customers by purchase behavior without pre-defined groups.
Reinforcement learning (RL) — an agent observes state s, takes action a, receives scalar reward r, and transitions to a new state. It learns a policy π(s) → a that maximizes expected cumulative (discounted) reward. No explicit target is given; the signal is delayed and sparse. Examples: AlphaGo, robotics control, recommendation systems tuned for engagement.
A common hybrid is self-supervised learning (used in LLMs): labels are derived automatically from the data itself (e.g., next-token prediction), making it scale like unsupervised while training like supervised.