Machine Learning Easy Asked at AmazonAsked at MicrosoftAsked at Apple

What is the difference between classification and regression, and how do you choose between them?

For Data Scientist ML Engineer Data Analyst

The short answer

Classification predicts a discrete class label; regression predicts a continuous numeric value. The choice is determined by the nature of the target variable, not by the algorithm family — many algorithms (e.g., decision trees, neural nets) handle both.

How to think about it

The distinction is about the output space.

Classification — the target y is a category drawn from a finite set. Binary classification has two classes (fraud / not-fraud); multi-class has more (digit 0–9); multi-label allows multiple simultaneous classes (image tags). The model typically outputs a probability distribution over classes, and a threshold or argmax converts it to a label. Key metrics: accuracy, precision, recall, F1, AUC-ROC.

Regression — the target y is a real number (or vector of real numbers). Predicting tomorrow’s closing price, estimating a patient’s blood-glucose level, or forecasting demand in units are all regression problems. Key metrics: MAE, RMSE, R².

How to decide:

Signal	Use
Target is a label or category	Classification
Target is a quantity on a continuous scale	Regression
Target is an ordered category (poor/fair/good)	Ordinal regression or classification with ordered labels
Predicting a count (non-negative integers)	Poisson/count regression, not standard regression

Some problems admit both framings: predicting whether revenue exceeds $1 M is classification; predicting the revenue itself is regression. Choose the framing that matches the downstream decision.

from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.datasets import make_classification, make_regression

X_c, y_c = make_classification(n_samples=500, random_state=0)
clf = LogisticRegression().fit(X_c, y_c)          # discrete output

X_r, y_r = make_regression(n_samples=500, random_state=0)
reg = LinearRegression().fit(X_r, y_r)            # continuous output

Learn it properly What ML actually is

What is the difference between classification and regression, and how do you choose between them?

Keep practising

Explore further