What loss function does logistic regression optimize, and why is it convex?

Logistic regression minimizes binary cross-entropy (log-loss), which is the negative log-likelihood of the Bernoulli distribution given the sigmoid-transformed linear predictions. The Hessian of log-loss is positive semi-definite everywhere, guaranteeing a convex surface with a unique global minimum.

What is the kernel trick in SVM, and why does it work?

The kernel trick lets an SVM find a nonlinear decision boundary by implicitly mapping data into a higher-dimensional space where it becomes linearly separable, without ever computing that mapping explicitly. It works because the SVM's dual formulation depends only on dot products between points, and a kernel function computes that dot product directly in the high-dimensional space. Common kernels are linear, polynomial, and RBF.

Why does sigmoid saturation cause vanishing gradients, and why is tanh only a partial fix?

Sigmoid's derivative peaks at 0.25 and approaches zero in both tails, so the chain of gradient multiplications collapses exponentially in deep networks. Tanh's derivative peaks at 1 and is zero-centered, which helps weight update symmetry, but it still saturates at large magnitudes and the gradient still shrinks to near-zero in both tails.

Compare sigmoid, tanh, ReLU, leaky ReLU, and GELU — when would you pick each?

Sigmoid squashes to (0,1) and saturates at extremes, causing vanishing gradients. Tanh is zero-centered but still saturates. ReLU avoids saturation for positive inputs and trains fast but can produce dead neurons. Leaky ReLU fixes dying neurons. GELU is smooth and probabilistic, now the default in most transformer architectures.

Convexity & Single-Variable Optimization — GATE DA

What you'll learn

Convex = graph lies below every chord (bowl-shaped); for twice-differentiable f, this is exactly f''(x) ≥ 0 everywhere

f'' > 0 everywhere ⇒ strictly convex; the payoff is that any local minimum is global

A strictly convex function has at most one minimizer — but it need not have one at all

Why convex loss functions are the well-behaved ones to optimize in ML

The last lesson ended on an anxiety. Finding a minimum has so far meant solving f'(x) = 0 and hoping the dip you land in is the lowest one — and on a wiggly function with several valleys, “local minimum” is the most you can honestly claim. Convexity is the property that makes the anxiety vanish. A convex function is one single bowl, so the first dip you find is the dip; there is nothing lower hiding elsewhere. It is the quiet reason so much of machine learning is built on convex loss functions.

What convex means

Picture a bowl. A function is convex if its graph lies below (or on) any chord — pick two points on the curve, draw the straight segment between them, and the curve never pokes above that segment. A non-convex curve has wiggles: it rises above some chords and dips into several separate valleys, each one a trap an optimiser could fall into.

Convex: every chord sits above the curve, so there is a single valley. Non-convex: chords get crossed, and separate valleys appear.

The test you actually use: the second derivative

The chord definition is the meaning, but for a twice-differentiable function the working test is much simpler — and it reuses the curvature idea from two lessons back. A curve bends upward exactly when its slope is increasing, which is to say when the second derivative is non-negative:

f is convex  ⇔  f''(x) ≥ 0  for all x

f''(x) > 0 for all x   ⇒   f is strictly convex

So convexity is a statement about f'' everywhere, not at one lonely point. A parabola x² has f'' = 2 > 0, so it is strictly convex; so is eˣ (f'' = eˣ > 0). A straight line has f'' = 0, so it is convex but not strictly so.

Why convexity is the prize

Two consequences make convex functions the ones optimisers love:

Any local minimum is a global minimum. With no rival valleys, the moment you find a dip you are done — there is nothing lower hiding elsewhere. Gradient descent cannot get trapped in a bad local minimum, because there are none to be trapped in. This is the exact property the last lesson wished for.
A strictly convex function has at most one minimizer. Two distinct lowest points would force a flat-or-bulging stretch between them, contradicting strict convexity. So the solution, if it exists, is unique.

This is precisely why linear regression (squared loss) and logistic regression have convex objectives: a single global optimum, reachable by simple descent, with no fear of a worse valley around the corner.

Watch it happen. Drop the marker anywhere on the convex bowl below and run gradient descent — wherever you start, the path slides to the same point at the bottom. No “bad starts,” no second valleys to fall into. That single picture is the convexity payoff made concrete.

Trygradient descent

Click to drop a ball — watch it roll downhill

Click anywhere to drop the ball there. target (2, −1)

x-4.000

y3.000

loss100.000

steps0

learning rate0.100

0.0010.5

How GATE asks this

The signature question is an MSQ: “f is twice differentiable with f''(x) > 0 for all x — which of the following are always true?” You must tick the genuine consequences (at most one stationary point; any local min is global) and reject the tempting “f has a minimum,” which the eˣ counterexample kills. Occasionally it appears as an MCQ asking for the single guaranteed property, or a NAT pinning down the unique minimizer of a convex quadratic.

Worked example — a real GATE DA 2025 question

Let f be twice differentiable with f''(x) > 0 for all real x. Which of the following statements are always true?

f is strictly convex.

f has at most one point where f'(x) = 0.

Any local minimum of f is a global minimum.

f has a minimum.

Take the verdicts one at a time, the way the always-true drills taught you.

(1) TRUE. f'' > 0 everywhere is the definition of strictly convex.
(2) TRUE. f'' > 0 means f' is strictly increasing, so f' can cross zero at most once. Hence at most one stationary point.
(3) TRUE. For a convex function every local minimum is global — no lower valley can exist elsewhere.
(4) FALSE — the trap. Strict convexity does not force a minimum to exist. Take f(x) = eˣ: here f'' = eˣ > 0 for all x, so it satisfies the hypothesis, yet f' = eˣ is never zero and f has no minimum (it slides toward 0 as x → −∞ without ever attaining it).

So the always-true statements are (1), (2), and (3); statement (4) is false. This is a real GATE DA 2025 MSQ, and “f has a minimum” is precisely the distractor most students tick.

A question to carry forward

That completes the calculus toolkit — limits and how to compute them, the derivative and its rules, Taylor series, critical points, the second-derivative test, global optimisation on an interval, and now convexity. But notice something about every worked example so far: the problem told you which tool to use. The real exam does not. It hands you an unfamiliar expression and a single blank, and the first — often hardest — step is simply recognising which idea unlocks it: a conjugate here, a Taylor expansion there, an endpoint check, a convexity argument. Here is the thread into the final lesson of this chapter: how do you read a fresh problem and reach for the right instrument fast, under exam pressure, when nobody labels it for you?

In one breath

Convex = graph stays below every chord (one bowl); for twice-differentiable f this is exactly f''(x) ≥ 0 everywhere. f'' > 0 everywhere ⇒ strictly convex.
The prize: for a convex f, any local min is global; for a strictly convex f, the minimizer is unique (at most one).
The trap: convex does not imply a minimum exists — eˣ is strictly convex yet has none (the real GATE DA 2025 distractor).
Strict vs non-strict: f'' = 0 stretches are allowed in plain convexity (a line, a flat valley) but forbidden in strict convexity — strictness is what buys uniqueness.
It is why squared-loss and logistic regression are easy to optimise: one global optimum, reachable by plain descent.

Practice

Quick check

0/6

Q1Recall: which of these functions are convex on the whole real line? (select all that apply)select all that apply

Q2Trace: find the value of x at which the convex function f(x) = x² − 4x + 7 attains its minimum.numerical answer — type a number

Q3Trace: for the same f(x) = x² − 4x + 7, what is the minimum VALUE of f?numerical answer — type a number

Q4Apply: let f be twice differentiable with f''(x) > 0 for all x. Which statements are ALWAYS true? (select all that apply)select all that apply

Q5Apply: a student claims, 'My loss function has f'' > 0 everywhere, so gradient descent is guaranteed to converge to a minimum.' What is the flaw?

Q6Create: which statements about a CONVEX (not necessarily strictly convex) function are always true? (select all that apply)select all that apply

Convexity & Single-Variable Optimization

What you'll learn

Before you start