datarekha

Logistic Regression

Despite the name it is a classifier: a linear score wᵀx + b squashed by the sigmoid into a probability, trained with log-loss.

7 min read Intermediate GATE DA Lesson 85 of 122

What you'll learn

  • Logistic regression is classification, not regression — the output is a probability in (0,1)
  • The sigmoid σ(z) = 1/(1 + e⁻ᶻ) maps the linear score z = wᵀx + b to a probability
  • The decision boundary is the line z = 0, where σ = 0.5 — linear in x
  • It is trained with log-loss / cross-entropy, not squared error

Before you start

The name is a trap. Logistic regression is a classifier — it predicts a class, not a continuous number. It earns “regression” only because, under the hood, it first computes a plain linear score just like linear regression does, and then bends that score into a probability. It is still the default first classifier in industry — fast, interpretable, and the baseline a neural network has to justify beating.

So the mental model is two stages. First, a linear score z = wᵀx + b — a weighted sum of the features, exactly the line you already know. Second, a squashing function that turns any real number z into a probability between 0 and 1. That squasher is the sigmoid, and it is the whole reason this works for classification.

The sigmoid turns a score into a probability

The score z = wᵀx + b can be any real number (large positive, large negative). We need a probability in (0, 1). The sigmoid does exactly that:

z = 0σ = 0.510score z →σ(z) = 1 / (1 + e−ᶻ)
The sigmoid is an S-curve: large positive z → near 1, large negative z → near 0, and exactly 0.5 at z = 0.

Read off three facts that GATE leans on:

  • As z grows large and positive, e⁻ᶻ → 0, so σ(z) → 1.
  • As z grows large and negative, e⁻ᶻ → ∞, so σ(z) → 0.
  • At z = 0, e⁻ᶻ = 1, so σ(0) = 1 / (1 + 1) = 0.5 exactly.

The output σ(z) is read as P(y = 1 | x) — the model’s estimated probability that the point belongs to the positive class.

The decision boundary is z = 0

To turn a probability into a class, threshold at 0.5: predict positive when σ(z) ≥ 0.5, negative otherwise. But σ(z) = 0.5 happens exactly when z = 0, so the decision boundary is the set of points where wᵀx + b = 0. That is a straight line (a hyperplane in higher dimensions) — the boundary is linear in x, even though the sigmoid itself is curved.

Drag the boundary below to separate the two classes by hand, then hit Fit to watch the model find the separator for you:

It is trained with log-loss (cross-entropy), −[y log p + (1 − y) log(1 − p)], not squared error. Log-loss punishes a confident wrong prediction (say p = 0.99 when the true label is 0) far more harshly, which is what you want from a probability model — and it keeps the optimisation well-behaved.

How GATE asks this

Usually an MCQ that probes one of three things: evaluate the sigmoid at a given score (often a NAT, with the relevant e value supplied), identify the decision boundary (the correct answer is the linear equation wᵀx + b = 0, not a curve), or name the loss (cross-entropy / log-loss, never mean squared error). A favourite distractor claims logistic regression outputs a continuous quantity like linear regression — it does not; it outputs a class probability.

Worked example — evaluate the sigmoid

A logistic model produces score z for a point. Find σ(z) for z = 0, z = 2, and z = −2. Use e⁻² ≈ 0.135. Which class is z = 2?

Apply σ(z) = 1 / (1 + e⁻ᶻ) term by term:

σ(0)  = 1 / (1 + e⁰)    = 1 / (1 + 1)     = 0.5      ← on the boundary
σ(2)  = 1 / (1 + e⁻²)   = 1 / (1 + 0.135) = 1/1.135  ≈ 0.881
σ(−2) = 1 / (1 + e²)    = 1 / (1 + 7.389) = 1/8.389  ≈ 0.119

A quick shortcut to check σ(−2): the sigmoid is symmetric, σ(−z) = 1 − σ(z), so σ(−2) = 1 − 0.881 = 0.119. ✓

Since σ(2) ≈ 0.881 > 0.5, the point with z = 2 is classified positive, with about 88% confidence. The point with z = −2 would be classified negative (only ~12% chance of being positive).

Quick check

Quick check

0/6
Q1A logistic regression model computes a score z = 2 for a sample. Given e⁻² ≈ 0.135, what probability σ(z) does it assign to the positive class? (3 decimals)numerical answer — type a number
Q2For a logistic model with weights w = (1, 2) and bias b = −5, the decision boundary is the set of points (x₁, x₂) satisfying which equation?
Q3What is the score z that makes σ(z) = 0.5?numerical answer — type a number
Q4Which statements about logistic regression are TRUE? (select all that apply)select all that apply
Q5A model outputs σ(z) = 0.881 for the positive class. By the sigmoid's symmetry σ(−z) = 1 − σ(z), what probability would a sample with score −z receive for the positive class? (3 decimals)numerical answer — type a number
Q6Why is squared error a poor loss for logistic regression compared with log-loss?

Practice this in an interview

All questions
Why is linear regression unsuitable for binary classification, and what specific problems does logistic regression fix?

Linear regression predicts unbounded real values, so it can output probabilities below 0 or above 1, and its loss function penalizes confident correct predictions. Logistic regression fixes this by applying the sigmoid to map any real score to (0,1) and optimizing log-loss, which is a proper scoring rule aligned with probability calibration.

Explain the relationship between the sigmoid function, odds, and log-odds in logistic regression.

Logistic regression models log-odds as a linear function of the features. Exponentiating the coefficients gives odds ratios, and applying the sigmoid to the linear score converts it to a probability. These three representations are equivalent reformulations of the same model.

What loss function does logistic regression optimize, and why is it convex?

Logistic regression minimizes binary cross-entropy (log-loss), which is the negative log-likelihood of the Bernoulli distribution given the sigmoid-transformed linear predictions. The Hessian of log-loss is positive semi-definite everywhere, guaranteeing a convex surface with a unique global minimum.

What is log loss and why does it penalise confident wrong predictions more than uncertain ones?

Log loss (cross-entropy loss) measures how well a model's predicted probabilities match the true labels: it is the negative log-likelihood of the correct class. It penalises confident wrong predictions severely because log(p) approaches negative infinity as p approaches zero — predicting 0.99 for the wrong class incurs roughly 100x the penalty of predicting 0.6 for the wrong class. A perfect model achieves 0; a random binary classifier achieves ln(2) ≈ 0.693.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content