Logistic Regression
Despite the name it is a classifier: a linear score wᵀx + b squashed by the sigmoid into a probability, trained with log-loss.
What you'll learn
- Logistic regression is classification, not regression — the output is a probability in (0,1)
- The sigmoid σ(z) = 1/(1 + e⁻ᶻ) maps the linear score z = wᵀx + b to a probability
- The decision boundary is the line z = 0, where σ = 0.5 — linear in x
- It is trained with log-loss / cross-entropy, not squared error
Before you start
The name is a trap. Logistic regression is a classifier — it predicts a class, not a continuous number. It earns “regression” only because, under the hood, it first computes a plain linear score just like linear regression does, and then bends that score into a probability. It is still the default first classifier in industry — fast, interpretable, and the baseline a neural network has to justify beating.
So the mental model is two stages. First, a linear score z = wᵀx + b — a
weighted sum of the features, exactly the line you already know. Second, a
squashing function that turns any real number z into a probability between 0
and 1. That squasher is the sigmoid, and it is the whole reason this works for
classification.
The sigmoid turns a score into a probability
The score z = wᵀx + b can be any real number (large positive, large negative).
We need a probability in (0, 1). The sigmoid does exactly that:
Read off three facts that GATE leans on:
- As
zgrows large and positive,e⁻ᶻ → 0, soσ(z) → 1. - As
zgrows large and negative,e⁻ᶻ → ∞, soσ(z) → 0. - At
z = 0,e⁻ᶻ = 1, soσ(0) = 1 / (1 + 1) = 0.5exactly.
The output σ(z) is read as P(y = 1 | x) — the model’s estimated probability
that the point belongs to the positive class.
The decision boundary is z = 0
To turn a probability into a class, threshold at 0.5: predict positive when
σ(z) ≥ 0.5, negative otherwise. But σ(z) = 0.5 happens exactly when z = 0, so
the decision boundary is the set of points where wᵀx + b = 0. That is a
straight line (a hyperplane in higher dimensions) — the boundary is linear in
x, even though the sigmoid itself is curved.
Drag the boundary below to separate the two classes by hand, then hit Fit to watch the model find the separator for you:
It is trained with log-loss (cross-entropy), −[y log p + (1 − y) log(1 − p)],
not squared error. Log-loss punishes a confident wrong prediction (say p = 0.99
when the true label is 0) far more harshly, which is what you want from a
probability model — and it keeps the optimisation well-behaved.
How GATE asks this
Usually an MCQ that probes one of three things: evaluate the sigmoid at a
given score (often a NAT, with the relevant e value supplied), identify the
decision boundary (the correct answer is the linear equation wᵀx + b = 0, not a
curve), or name the loss (cross-entropy / log-loss, never mean squared error). A
favourite distractor claims logistic regression outputs a continuous quantity like
linear regression — it does not; it outputs a class probability.
Worked example — evaluate the sigmoid
A logistic model produces score
zfor a point. Findσ(z)forz = 0,z = 2, andz = −2. Usee⁻² ≈ 0.135. Which class isz = 2?
Apply σ(z) = 1 / (1 + e⁻ᶻ) term by term:
σ(0) = 1 / (1 + e⁰) = 1 / (1 + 1) = 0.5 ← on the boundary
σ(2) = 1 / (1 + e⁻²) = 1 / (1 + 0.135) = 1/1.135 ≈ 0.881
σ(−2) = 1 / (1 + e²) = 1 / (1 + 7.389) = 1/8.389 ≈ 0.119
A quick shortcut to check σ(−2): the sigmoid is symmetric,
σ(−z) = 1 − σ(z), so σ(−2) = 1 − 0.881 = 0.119. ✓
Since σ(2) ≈ 0.881 > 0.5, the point with z = 2 is classified positive, with
about 88% confidence. The point with z = −2 would be classified negative
(only ~12% chance of being positive).
Quick check
Quick check
Practice this in an interview
All questionsLinear regression predicts unbounded real values, so it can output probabilities below 0 or above 1, and its loss function penalizes confident correct predictions. Logistic regression fixes this by applying the sigmoid to map any real score to (0,1) and optimizing log-loss, which is a proper scoring rule aligned with probability calibration.
Logistic regression models log-odds as a linear function of the features. Exponentiating the coefficients gives odds ratios, and applying the sigmoid to the linear score converts it to a probability. These three representations are equivalent reformulations of the same model.
Logistic regression minimizes binary cross-entropy (log-loss), which is the negative log-likelihood of the Bernoulli distribution given the sigmoid-transformed linear predictions. The Hessian of log-loss is positive semi-definite everywhere, guaranteeing a convex surface with a unique global minimum.
Log loss (cross-entropy loss) measures how well a model's predicted probabilities match the true labels: it is the negative log-likelihood of the correct class. It penalises confident wrong predictions severely because log(p) approaches negative infinity as p approaches zero — predicting 0.99 for the wrong class incurs roughly 100x the penalty of predicting 0.6 for the wrong class. A perfect model achieves 0; a random binary classifier achieves ln(2) ≈ 0.693.