Why does sigmoid saturation cause vanishing gradients, and why is tanh only a partial fix?

Sigmoid's derivative peaks at 0.25 and approaches zero in both tails, so the chain of gradient multiplications collapses exponentially in deep networks. Tanh's derivative peaks at 1 and is zero-centered, which helps weight update symmetry, but it still saturates at large magnitudes and the gradient still shrinks to near-zero in both tails.

What loss function does logistic regression optimize, and why is it convex?

Logistic regression minimizes binary cross-entropy (log-loss), which is the negative log-likelihood of the Bernoulli distribution given the sigmoid-transformed linear predictions. The Hessian of log-loss is positive semi-definite everywhere, guaranteeing a convex surface with a unique global minimum.

Compare sigmoid, tanh, ReLU, leaky ReLU, and GELU — when would you pick each?

Sigmoid squashes to (0,1) and saturates at extremes, causing vanishing gradients. Tanh is zero-centered but still saturates. ReLU avoids saturation for positive inputs and trains fast but can produce dead neurons. Leaky ReLU fixes dying neurons. GELU is smooth and probabilistic, now the default in most transformer architectures.

What is gradient clipping, and when is it necessary?

Gradient clipping caps the norm (or per-element value) of gradients before the optimiser step, preventing any single update from being so large that it destabilises training. It is especially important in recurrent networks and transformers where gradients can explode across many time steps or attention heads, and in any network trained with a high learning rate on noisy data.

Critical Points & Monotonicity — GATE DA

What you'll learn

A critical point is where `f'(x) = 0` or `f'` is undefined — the only candidates for local extrema

`f' > 0` means increasing, `f' < 0` means decreasing

A sign change of `f'` classifies the point: minus-to-plus is a local min, plus-to-minus is a local max

`f'(x) = 0` is a candidate, not a guarantee — `x³` flattens at 0 but has no extremum there

Last lesson left you standing at the top of a hill, where the curve goes flat and the slope drops to zero. Let us put that picture to work. Imagine walking along the graph of a function — uphill, then downhill, then uphill again — where the slope under your feet at each step is just f'(x). The only places you ever stop climbing or falling are the tops of hills and the bottoms of valleys, and at exactly those spots the ground goes flat.

That flatness is the move the exam wants from you: to find the peaks and troughs of a function, hunt for the flat spots first, then glance at the slope on either side to see which kind you have landed on. It is also the bedrock of how every machine-learning model trains — “minimise the loss” means “find where the gradient flattens to zero,” so spotting flat spots is the very same move you will make on a loss curve later.

Critical points: where the slope flattens

A critical point of f is a value x where

f'(x) = 0        (the tangent is flat)
   or
f'(x) is undefined   (a corner, cusp, or vertical tangent)

These are the only places a continuous function can have a local extremum — a local max or min. Everywhere else f' carries a definite sign, so the function is strictly going up or strictly going down, and a curve that is committed to one direction simply cannot turn around.

Drag the point along the curve below and watch the tangent. Where the tangent line goes flat, you are standing on a critical point; step to either side and the slope picks up a clear sign again.

TryDerivatives · slope of the tangent

Drag the point — read the slope off the tangent line

The secant line touches the curve at two points a distance h apart. Shrink h toward 0 and watch it rotate into the tangent: that limiting slope is the derivative f′(x).

f(x)tangent & secant

Drag the point along the curve.

f′(x)the derivative

Each drag plots (x, f′(x)) — it traces the derivative.

gap h0.800

0.001h → 02.0

x1.200

f(x)1.440

secant slope3.200(f(x+h)−f(x)) / h

f′(x) = tangent slope2.400the limit as h → 0

The secant slope is 3.200; the true derivative is 2.400. Shrink h and watch the gap close.

Monotonicity: reading the sign of f’

Between two critical points the sign of f' never changes, and that sign is the direction of travel:

f'(x) > 0 on an interval ⇒ f is increasing there.
f'(x) < 0 on an interval ⇒ f is decreasing there.

The interesting thing happens as you cross a critical point — watch how the sign flips:

minus then plus (− to +): the curve was falling, then rises — a local minimum.
plus then minus (+ to −): the curve was rising, then falls — a local maximum.
no sign change (+ to +, or − to −): not an extremum — the curve flattens for an instant but keeps going the same way.

That recipe is the first-derivative test: locate the critical points, then read the sign of f' just left and just right of each one.

Flat tangent at each critical point; f′ flips sign across each one — plus-to-minus at a max, minus-to-plus at a min.

How GATE asks this

Expect an MCQ or NAT that either (a) asks for the critical point(s) of a given function, or (b) asks for the interval where the function is increasing or decreasing. The recipe is mechanical: differentiate, set f'(x) = 0, solve for the candidate x-values, and read the sign of f' on each piece of the number line. Cubics like f(x) = x³ − 3x are the standard vehicle — two critical points, one max and one min.

Worked example

Find the critical points of f(x) = x³ − 3x and the intervals where it is increasing or decreasing.

Differentiate and set the derivative to zero:

f'(x) = 3x² − 3 = 3(x² − 1) = 3(x − 1)(x + 1)

f'(x) = 0   →   x = −1   and   x = +1     ← the two critical points

Now test the sign of f'(x) = 3(x − 1)(x + 1) on the three intervals the critical points carve out of the number line:

interval        test x     (x+1)   (x−1)    f'(x)    behaviour
─────────────────────────────────────────────────────────────────
x < −1          x = −2       −        −       +      increasing
−1 < x < 1      x =  0       +        −       −      decreasing
x > 1           x =  2       +        +       +      increasing

Reading the sign changes across each critical point:

at x = −1: f' goes + → − ⇒ local maximum (value f(−1) = (−1)³ − 3(−1) = −1 + 3 = 2).
at x = 1: f' goes − → + ⇒ local minimum (value f(1) = 1 − 3 = −2).

So f increases on (−∞, −1), decreases on (−1, 1), and increases again on (1, ∞) — a local max at x = −1 and a local min at x = 1, exactly the two-flat-spot shape you predicted for a cubic.

A question to carry forward

The first-derivative test works, but notice the labour: for every candidate you must probe the sign of f' on both sides and reason through the flip. With several critical points, or under exam time, that bookkeeping adds up. Yet Taylor already hinted at a shortcut — the second derivative measures the bend of a curve: it cups upward at the bottom of a valley and arches downward at the top of a hill. Here is the thread onward: could a single number — the value of f'' at the critical point — classify it as a peak or a trough in one step, and then let us solve genuine optimisation problems like “what dimensions use the least material”?

In one breath

A critical point is where f'(x) = 0 or f' is undefined — the only candidates for a local max or min.
Sign of f' is direction: f' > 0 increasing, f' < 0 decreasing.
First-derivative test: across a critical point, −→+ is a local min, +→− is a local max, no change is neither.
f'(x) = 0 is a candidate, not a verdict: x³ has f'(0) = 0 but no extremum (an inflection — f' stays positive both sides).
Don’t miss critical points where f' is undefined — the corner of |x| is a true minimum the f'=0 rule alone would skip.

Practice

Quick check

0/5

Q1Recall: which statements about critical points are TRUE? (select all that apply)select all that apply

Q2Trace: f(x) = 2x³ − 6x² + 5. Find the LARGER of its two critical points.numerical answer — type a number

Q3Apply: for f(x) = x³ − 3x, at which x does the LOCAL MAXIMUM occur?numerical answer — type a number

Q4Apply: on which interval is f(x) = x³ − 3x DECREASING?

Q5Create: a function has f'(x) = (x − 2)²(x + 3). Which of the following are TRUE? (select all that apply)select all that apply

Critical Points & Monotonicity

What you'll learn

Before you start

Critical points: where the slope flattens

Drag the point — read the slope off the tangent line

Monotonicity: reading the sign of f’

How GATE asks this

Worked example

A question to carry forward

In one breath

Practice

Quick check

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further