Why is linear regression unsuitable for binary classification, and what specific problems does logistic regression fix?
Linear regression predicts unbounded real values, so it can output probabilities below 0 or above 1, and its loss function penalizes confident correct predictions. Logistic regression fixes this by applying the sigmoid to map any real score to (0,1) and optimizing log-loss, which is a proper scoring rule aligned with probability calibration.
How to think about it
Problem 1 — Unbounded outputs. Linear regression predicts ŷ = Xβ, which ranges over all reals. Interpreting values outside [0, 1] as probabilities is meaningless, and a decision threshold of 0.5 becomes arbitrary when predictions routinely exceed 1 or go negative.
Problem 2 — Wrong loss function. OLS minimizes squared error. For a binary label y ∈ {0,1}, MSE is not a proper scoring rule — it penalizes a model that predicts 0.99 for a true positive more harshly than one that predicts 0.6, distorting learning.
Problem 3 — Non-constant variance. Binary outcomes are Bernoulli — the variance p(1-p) depends on p, violating homoscedasticity. OLS estimates are unbiased but inefficient and standard errors are invalid.
What logistic regression does instead:
The sigmoid maps the linear score z = Xβ to a probability:
σ(z) = 1 / (1 + e^(-z))
The model then maximizes the log-likelihood (minimizes log-loss / binary cross-entropy):
L = -[y log(p) + (1-y) log(1-p)]
This is a convex objective with a unique global minimum (no local minima), and the decision boundary p = 0.5 corresponds exactly to Xβ = 0.
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)
proba = clf.predict_proba(X_test)[:, 1] # calibrated probabilities in (0,1)