datarekha
Machine Learning Easy Asked at GoogleAsked at MetaAsked at Microsoft

Why is linear regression unsuitable for binary classification, and what specific problems does logistic regression fix?

The short answer

Linear regression predicts unbounded real values, so it can output probabilities below 0 or above 1, and its loss function penalizes confident correct predictions. Logistic regression fixes this by applying the sigmoid to map any real score to (0,1) and optimizing log-loss, which is a proper scoring rule aligned with probability calibration.

How to think about it

Problem 1 — Unbounded outputs. Linear regression predicts ŷ = Xβ, which ranges over all reals. Interpreting values outside [0, 1] as probabilities is meaningless, and a decision threshold of 0.5 becomes arbitrary when predictions routinely exceed 1 or go negative.

Problem 2 — Wrong loss function. OLS minimizes squared error. For a binary label y ∈ {0,1}, MSE is not a proper scoring rule — it penalizes a model that predicts 0.99 for a true positive more harshly than one that predicts 0.6, distorting learning.

Problem 3 — Non-constant variance. Binary outcomes are Bernoulli — the variance p(1-p) depends on p, violating homoscedasticity. OLS estimates are unbiased but inefficient and standard errors are invalid.

What logistic regression does instead:

The sigmoid maps the linear score z = Xβ to a probability:

σ(z) = 1 / (1 + e^(-z))

The model then maximizes the log-likelihood (minimizes log-loss / binary cross-entropy):

L = -[y log(p) + (1-y) log(1-p)]

This is a convex objective with a unique global minimum (no local minima), and the decision boundary p = 0.5 corresponds exactly to Xβ = 0.

from sklearn.linear_model import LogisticRegression

clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)
proba = clf.predict_proba(X_test)[:, 1]  # calibrated probabilities in (0,1)
Learn it properly Logistic regression

Keep practising

All Machine Learning questions

Explore further

Skip to content