datarekha

Critical Points & Monotonicity

Where a curve flattens out: f'(x) = 0 marks the candidates for peaks and valleys, and the sign of f' tells you where the function climbs or falls.

6 min read Intermediate GATE DA Lesson 42 of 122

What you'll learn

  • A critical point is where `f'(x) = 0` or `f'` is undefined — the only candidates for local extrema
  • `f' > 0` means increasing, `f' < 0` means decreasing
  • A sign change of `f'` classifies the point: minus-to-plus is a local min, plus-to-minus is a local max
  • `f'(x) = 0` is a candidate, not a guarantee — `x³` flattens at 0 but has no extremum there

Before you start

Picture yourself walking along the graph of a function. Uphill, downhill, uphill — the slope under your feet is just f'(x). The only places you ever stop climbing or falling are the tops of hills and the bottoms of valleys, and at exactly those points the ground is flat. That’s the trick the exam wants you to use: to find the peaks and troughs of a function, hunt for the flat spots first. The sign of f' on either side tells you which kind you’ve landed on.

This is the bedrock of how every ML model is trained: “minimise the loss” means “find where the gradient flattens to zero,” so spotting those flat spots is the same move you’ll make on a loss curve later.

Critical points: where the slope flattens

A critical point of f is a value x where

f'(x) = 0        (the tangent is flat)
   or
f'(x) is undefined   (a corner, cusp, or vertical tangent)

These are the only places a continuous function can have a local extremum (a local max or min). Everywhere else f' has a definite sign, so the function is strictly going up or down and cannot turn around.

Drag the point along the curve below and watch the tangent. Where the tangent line goes flat, you are standing on a critical point; on either side the slope has a clear sign.

Monotonicity: reading the sign of f’

Between critical points the sign of f' is constant, and that sign is the direction of travel:

  • f'(x) > 0 on an interval ⇒ f is increasing there.
  • f'(x) < 0 on an interval ⇒ f is decreasing there.

When you cross a critical point, watch how the sign flips:

  • minus then plus ( to +): the curve was falling, then rises — a local minimum.
  • plus then minus (+ to ): the curve was rising, then falls — a local maximum.
  • no sign change (+ to +, or to ): not an extremum — the curve flattens but keeps going the same way.

This is the first-derivative test: locate critical points, then read the sign of f' just left and right of each.

local maxlocal minf′ > 0f′ < 0f′ > 0
Flat tangent at each critical point; f′ flips sign across each one — plus-to-minus at a max, minus-to-plus at a min.

How GATE asks this

Expect an MCQ or NAT that either (a) asks for the critical point(s) of a given function, or (b) asks for the interval where the function is increasing or decreasing. The recipe is mechanical: differentiate, set f'(x) = 0, solve for the candidate x-values, and read the sign of f' on each piece of the number line. Cubics like f(x) = x³ − 3x are the standard vehicle — two critical points, one max and one min.

Worked example

Find the critical points of f(x) = x³ − 3x and the intervals where it is increasing or decreasing.

Differentiate and set the derivative to zero:

f'(x) = 3x² − 3 = 3(x² − 1) = 3(x − 1)(x + 1)

f'(x) = 0   →   x = −1   and   x = +1     ← the two critical points

Now test the sign of f'(x) = 3(x − 1)(x + 1) on the three intervals the critical points carve out:

interval        test x     (x+1)   (x−1)    f'(x)    behaviour
─────────────────────────────────────────────────────────────────
x < −1          x = −2       −        −       +      increasing
−1 < x < 1      x =  0       +        −       −      decreasing
x > 1           x =  2       +        +       +      increasing

Reading the sign changes:

  • at x = −1: f' goes + → −local maximum (value f(−1) = (−1)³ − 3(−1) = −1 + 3 = 2).
  • at x = 1: f' goes − → +local minimum (value f(1) = 1 − 3 = −2).

So f increases on (−∞, −1), decreases on (−1, 1), and increases again on (1, ∞) — a local max at x = −1 and a local min at x = 1.

Quick check

Quick check

0/5
Q1For f(x) = x³ − 3x, at which x does the LOCAL MAXIMUM occur?numerical answer — type a number
Q2f(x) = 2x³ − 6x² + 5. Find the LARGER of its two critical points.numerical answer — type a number
Q3On which interval is f(x) = x³ − 3x DECREASING?
Q4Which statements about critical points are TRUE? (select all that apply)select all that apply
Q5A function has f'(x) = (x − 2)²(x + 3). Which of the following are TRUE? (select all that apply)select all that apply

Practice this in an interview

All questions
Why does sigmoid saturation cause vanishing gradients, and why is tanh only a partial fix?

Sigmoid's derivative peaks at 0.25 and approaches zero in both tails, so the chain of gradient multiplications collapses exponentially in deep networks. Tanh's derivative peaks at 1 and is zero-centered, which helps weight update symmetry, but it still saturates at large magnitudes and the gradient still shrinks to near-zero in both tails.

What loss function does logistic regression optimize, and why is it convex?

Logistic regression minimizes binary cross-entropy (log-loss), which is the negative log-likelihood of the Bernoulli distribution given the sigmoid-transformed linear predictions. The Hessian of log-loss is positive semi-definite everywhere, guaranteeing a convex surface with a unique global minimum.

Compare sigmoid, tanh, ReLU, leaky ReLU, and GELU — when would you pick each?

Sigmoid squashes to (0,1) and saturates at extremes, causing vanishing gradients. Tanh is zero-centered but still saturates. ReLU avoids saturation for positive inputs and trains fast but can produce dead neurons. Leaky ReLU fixes dying neurons. GELU is smooth and probabilistic, now the default in most transformer architectures.

What is gradient clipping, and when is it necessary?

Gradient clipping caps the norm (or per-element value) of gradients before the optimiser step, preventing any single update from being so large that it destabilises training. It is especially important in recurrent networks and transformers where gradients can explode across many time steps or attention heads, and in any network trained with a high learning rate on noisy data.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content