Critical Points & Monotonicity
Where a curve flattens out: f'(x) = 0 marks the candidates for peaks and valleys, and the sign of f' tells you where the function climbs or falls.
What you'll learn
- A critical point is where `f'(x) = 0` or `f'` is undefined — the only candidates for local extrema
- `f' > 0` means increasing, `f' < 0` means decreasing
- A sign change of `f'` classifies the point: minus-to-plus is a local min, plus-to-minus is a local max
- `f'(x) = 0` is a candidate, not a guarantee — `x³` flattens at 0 but has no extremum there
Before you start
Picture yourself walking along the graph of a function. Uphill, downhill, uphill —
the slope under your feet is just f'(x). The only places you ever stop climbing or
falling are the tops of hills and the bottoms of valleys, and at exactly those points
the ground is flat. That’s the trick the exam wants you to use: to find the
peaks and troughs of a function, hunt for the flat spots first. The sign of f' on
either side tells you which kind you’ve landed on.
This is the bedrock of how every ML model is trained: “minimise the loss” means “find where the gradient flattens to zero,” so spotting those flat spots is the same move you’ll make on a loss curve later.
Critical points: where the slope flattens
A critical point of f is a value x where
f'(x) = 0 (the tangent is flat)
or
f'(x) is undefined (a corner, cusp, or vertical tangent)
These are the only places a continuous function can have a local extremum (a
local max or min). Everywhere else f' has a definite sign, so the function is
strictly going up or down and cannot turn around.
Drag the point along the curve below and watch the tangent. Where the tangent line goes flat, you are standing on a critical point; on either side the slope has a clear sign.
Monotonicity: reading the sign of f’
Between critical points the sign of f' is constant, and that sign is the
direction of travel:
f'(x) > 0on an interval ⇒fis increasing there.f'(x) < 0on an interval ⇒fis decreasing there.
When you cross a critical point, watch how the sign flips:
- minus then plus (
−to+): the curve was falling, then rises — a local minimum. - plus then minus (
+to−): the curve was rising, then falls — a local maximum. - no sign change (
+to+, or−to−): not an extremum — the curve flattens but keeps going the same way.
This is the first-derivative test: locate critical points, then read the sign of
f' just left and right of each.
f′ flips sign across each one — plus-to-minus at a max, minus-to-plus at a min.How GATE asks this
Expect an MCQ or NAT that either (a) asks for the critical point(s) of a given
function, or (b) asks for the interval where the function is increasing or
decreasing. The recipe is mechanical: differentiate, set f'(x) = 0, solve for the
candidate x-values, and read the sign of f' on each piece of the number line.
Cubics like f(x) = x³ − 3x are the standard vehicle — two critical points, one
max and one min.
Worked example
Find the critical points of
f(x) = x³ − 3xand the intervals where it is increasing or decreasing.
Differentiate and set the derivative to zero:
f'(x) = 3x² − 3 = 3(x² − 1) = 3(x − 1)(x + 1)
f'(x) = 0 → x = −1 and x = +1 ← the two critical points
Now test the sign of f'(x) = 3(x − 1)(x + 1) on the three intervals the critical
points carve out:
interval test x (x+1) (x−1) f'(x) behaviour
─────────────────────────────────────────────────────────────────
x < −1 x = −2 − − + increasing
−1 < x < 1 x = 0 + − − decreasing
x > 1 x = 2 + + + increasing
Reading the sign changes:
- at
x = −1:f'goes+ → −⇒ local maximum (valuef(−1) = (−1)³ − 3(−1) = −1 + 3 = 2). - at
x = 1:f'goes− → +⇒ local minimum (valuef(1) = 1 − 3 = −2).
So f increases on (−∞, −1), decreases on (−1, 1), and increases again on
(1, ∞) — a local max at x = −1 and a local min at x = 1.
Quick check
Quick check
Practice this in an interview
All questionsSigmoid's derivative peaks at 0.25 and approaches zero in both tails, so the chain of gradient multiplications collapses exponentially in deep networks. Tanh's derivative peaks at 1 and is zero-centered, which helps weight update symmetry, but it still saturates at large magnitudes and the gradient still shrinks to near-zero in both tails.
Logistic regression minimizes binary cross-entropy (log-loss), which is the negative log-likelihood of the Bernoulli distribution given the sigmoid-transformed linear predictions. The Hessian of log-loss is positive semi-definite everywhere, guaranteeing a convex surface with a unique global minimum.
Sigmoid squashes to (0,1) and saturates at extremes, causing vanishing gradients. Tanh is zero-centered but still saturates. ReLU avoids saturation for positive inputs and trains fast but can produce dead neurons. Leaky ReLU fixes dying neurons. GELU is smooth and probabilistic, now the default in most transformer architectures.
Gradient clipping caps the norm (or per-element value) of gradients before the optimiser step, preventing any single update from being so large that it destabilises training. It is especially important in recurrent networks and transformers where gradients can explode across many time steps or attention heads, and in any network trained with a high learning rate on noisy data.