datarekha

Differentiability

A function is differentiable where it has a unique tangent slope. Differentiable implies continuous — but NOT the reverse, as ReLU and |x| show at their corner.

7 min read Intermediate GATE DA Lesson 39 of 122

What you'll learn

  • f is differentiable at a when f'(a) exists — a single, unique tangent slope
  • Differentiable ⇒ continuous, but the converse fails (|x| and ReLU at 0)
  • Sums, products, quotients (denominator ≠ 0), and compositions of differentiable functions are differentiable
  • The ML tie-in: ReLU = max(0, x) is continuous everywhere but not differentiable at 0 (a real 2025 question)

Before you start

Zoom in on a smooth curve far enough and it starts to look like a straight line. That line is the tangent, and its slope is the derivative. A function is differentiable at a point when that zoom-in works — when there is one clean tangent line waiting for you at the bottom of the zoom.

Where this breaks is at corners. A sharp kink means the curve arrives with one slope from the left and leaves with another to the right. No single tangent exists. The most famous example in deep learning sits right at the origin: the ReLU activation max(0, x) is continuous everywhere, but its corner at 0 makes it non-differentiable there. GATE DA 2025 asked exactly this — so it’s a fact worth understanding rather than memorising.

Differentiable at a point

f is differentiable at a if the derivative

            f(a + h) − f(a)
f'(a) = lim ───────────────
        h→0        h

exists — meaning the limit gives one finite number regardless of whether h approaches 0 from the left or the right. Geometrically, the secant line settles onto a single tangent. Drag the point along the smooth curve below and animate h → 0 to watch the secant rotate into that unique tangent slope.

When the left-hand slope and the right-hand slope disagree, no single limit exists, and the function is not differentiable there — even if the graph is unbroken.

smooth: one tangent existsslope −1slope +1|x|: corner at 0, no single slope
Left: a smooth curve has one tangent. Right: at the corner of |x| the left slope (−1) and right slope (+1) disagree, so it is not differentiable at 0.

The one-way implication

Here is the relationship GATE tests most:

differentiable at a   ⇒   continuous at a          (always)
continuous at a       ⇒   differentiable at a      (NOT in general)

If a function has a tangent slope it certainly cannot jump, so differentiability forces continuity. The reverse fails: |x| and ReLU = max(0, x) are continuous everywhere, yet their corner at 0 kills differentiability. Continuity is the weaker condition; differentiability is the stronger one.

Building differentiable functions

Differentiability is preserved by the usual algebra, so you can certify big expressions from small pieces. If f and g are differentiable at a, then so are:

  • their sum f + g and difference f − g,
  • their product f · g,
  • their quotient f / g — provided g(a) ≠ 0,
  • their composition f(g(x)) (the chain rule).

So a product like f·g is differentiable wherever both factors are, and a quotient inherits differentiability everywhere the denominator stays non-zero.

How GATE asks this

The 2025-era favourite is an MSQ: a list of functions or claims, “select all that are differentiable” or “select all true statements.” The recurring hooks are (1) the ReLU / |x| corner — continuous but not differentiable at 0 — and (2) the one-way implication (differentiable ⇒ continuous, never the converse). GATE DA 2025 tested exactly the ReLU fact. Expect also combination questions: sums, products, and compositions of differentiable functions stay differentiable.

Worked example — ReLU at the origin (GATE DA 2025)

Is ReLU(x) = max(0, x) continuous at x = 0? Is it differentiable there?

Write ReLU as a piecewise function and inspect the seam at 0:

ReLU(x) = 0   for x ≤ 0,     ReLU(x) = x   for x > 0

Continuity at 0: the left piece gives 0, the right limit gives 0, and ReLU(0) = 0 — all three agree, so ReLU is continuous at 0 (and everywhere).

Differentiability at 0: compare the one-sided slopes.

left slope  (x < 0):  d/dx [0] = 0
right slope (x > 0):  d/dx [x] = 1

0  ≠  1   →   the slopes disagree   →   NOT differentiable at x = 0

So ReLU is continuous everywhere but not differentiable at x = 0 — the corner exactly. The identical reasoning applies to |x|: left slope −1, right slope +1, so it too has a non-differentiable corner at 0. (Away from 0, both are perfectly differentiable.) For a combination such as h(x) = f(x)·g(x), h is differentiable wherever both f and g are — e.g. x · sin x is differentiable for all real x.

Quick check

Quick check

0/6
Q1Which statements are TRUE? (select all that apply)select all that apply
Q2Compute the right-hand derivative of ReLU(x) = max(0, x) at x = 0 (the slope for x > 0). (integer)numerical answer — type a number
Q3At how many points is f(x) = |x − 3| NOT differentiable? (integer)numerical answer — type a number
Q4Which of these functions are differentiable for ALL real x? (select all that apply)select all that apply
Q5The left-hand derivative of f at a is 2 and the right-hand derivative is 2. Is f differentiable at a, and if so what is f'(a)? Enter f'(a). (integer)numerical answer — type a number
Q6Which statements about combining differentiable functions are TRUE? (select all that apply)select all that apply

Practice this in an interview

All questions
Compare sigmoid, tanh, ReLU, leaky ReLU, and GELU — when would you pick each?

Sigmoid squashes to (0,1) and saturates at extremes, causing vanishing gradients. Tanh is zero-centered but still saturates. ReLU avoids saturation for positive inputs and trains fast but can produce dead neurons. Leaky ReLU fixes dying neurons. GELU is smooth and probabilistic, now the default in most transformer architectures.

What does the Universal Approximation Theorem guarantee — and what doesn't it guarantee?

The theorem proves that a single-hidden-layer network with enough neurons and a non-linear activation can approximate any continuous function on a compact domain to arbitrary precision. It guarantees existence, not learnability — it says nothing about how many neurons are needed, whether gradient descent will find the solution, or how the network will generalize.

Why does sigmoid saturation cause vanishing gradients, and why is tanh only a partial fix?

Sigmoid's derivative peaks at 0.25 and approaches zero in both tails, so the chain of gradient multiplications collapses exponentially in deep networks. Tanh's derivative peaks at 1 and is zero-centered, which helps weight update symmetry, but it still saturates at large magnitudes and the gradient still shrinks to near-zero in both tails.

What is GELU and why does it outperform ReLU in transformer models?

GELU (Gaussian Error Linear Unit) multiplies the input by the probability that a standard Gaussian random variable is smaller than it, producing a smooth, non-monotonic curve that approximates ReLU but with a stochastic regularization flavor. Transformers favor GELU because the smooth gradient near zero improves optimization in deep attention-based architectures.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content