Differentiability
A function is differentiable where it has a unique tangent slope. Differentiable implies continuous — but NOT the reverse, as ReLU and |x| show at their corner.
What you'll learn
- f is differentiable at a when f'(a) exists — a single, unique tangent slope
- Differentiable ⇒ continuous, but the converse fails (|x| and ReLU at 0)
- Sums, products, quotients (denominator ≠ 0), and compositions of differentiable functions are differentiable
- The ML tie-in: ReLU = max(0, x) is continuous everywhere but not differentiable at 0 (a real 2025 question)
Before you start
Zoom in on a smooth curve far enough and it starts to look like a straight line. That line is the tangent, and its slope is the derivative. A function is differentiable at a point when that zoom-in works — when there is one clean tangent line waiting for you at the bottom of the zoom.
Where this breaks is at corners. A sharp kink means the curve arrives with
one slope from the left and leaves with another to the right. No single tangent
exists. The most famous example in deep learning sits right at the origin: the
ReLU activation max(0, x) is continuous everywhere, but its corner at 0
makes it non-differentiable there. GATE DA 2025 asked exactly this — so it’s a
fact worth understanding rather than memorising.
Differentiable at a point
f is differentiable at a if the derivative
f(a + h) − f(a)
f'(a) = lim ───────────────
h→0 h
exists — meaning the limit gives one finite number regardless of whether h
approaches 0 from the left or the right. Geometrically, the secant line settles
onto a single tangent. Drag the point along the smooth curve below and animate
h → 0 to watch the secant rotate into that unique tangent slope.
When the left-hand slope and the right-hand slope disagree, no single limit exists, and the function is not differentiable there — even if the graph is unbroken.
The one-way implication
Here is the relationship GATE tests most:
differentiable at a ⇒ continuous at a (always)
continuous at a ⇒ differentiable at a (NOT in general)
If a function has a tangent slope it certainly cannot jump, so differentiability
forces continuity. The reverse fails: |x| and ReLU = max(0, x) are
continuous everywhere, yet their corner at 0 kills differentiability. Continuity
is the weaker condition; differentiability is the stronger one.
Building differentiable functions
Differentiability is preserved by the usual algebra, so you can certify big
expressions from small pieces. If f and g are differentiable at a, then so
are:
- their sum
f + gand differencef − g, - their product
f · g, - their quotient
f / g— providedg(a) ≠ 0, - their composition
f(g(x))(the chain rule).
So a product like f·g is differentiable wherever both factors are, and a
quotient inherits differentiability everywhere the denominator stays non-zero.
How GATE asks this
The 2025-era favourite is an MSQ: a list of functions or claims, “select all
that are differentiable” or “select all true statements.” The recurring hooks are
(1) the ReLU / |x| corner — continuous but not differentiable at 0 — and
(2) the one-way implication (differentiable ⇒ continuous, never the converse).
GATE DA 2025 tested exactly the ReLU fact. Expect also combination questions:
sums, products, and compositions of differentiable functions stay differentiable.
Worked example — ReLU at the origin (GATE DA 2025)
Is
ReLU(x) = max(0, x)continuous atx = 0? Is it differentiable there?
Write ReLU as a piecewise function and inspect the seam at 0:
ReLU(x) = 0 for x ≤ 0, ReLU(x) = x for x > 0
Continuity at 0: the left piece gives 0, the right limit gives 0, and
ReLU(0) = 0 — all three agree, so ReLU is continuous at 0 (and everywhere).
Differentiability at 0: compare the one-sided slopes.
left slope (x < 0): d/dx [0] = 0
right slope (x > 0): d/dx [x] = 1
0 ≠ 1 → the slopes disagree → NOT differentiable at x = 0
So ReLU is continuous everywhere but not differentiable at x = 0 — the corner
exactly. The identical reasoning applies to |x|: left slope −1, right slope
+1, so it too has a non-differentiable corner at 0. (Away from 0, both are
perfectly differentiable.) For a combination such as h(x) = f(x)·g(x), h is
differentiable wherever both f and g are — e.g. x · sin x is
differentiable for all real x.
Quick check
Quick check
Practice this in an interview
All questionsSigmoid squashes to (0,1) and saturates at extremes, causing vanishing gradients. Tanh is zero-centered but still saturates. ReLU avoids saturation for positive inputs and trains fast but can produce dead neurons. Leaky ReLU fixes dying neurons. GELU is smooth and probabilistic, now the default in most transformer architectures.
The theorem proves that a single-hidden-layer network with enough neurons and a non-linear activation can approximate any continuous function on a compact domain to arbitrary precision. It guarantees existence, not learnability — it says nothing about how many neurons are needed, whether gradient descent will find the solution, or how the network will generalize.
Sigmoid's derivative peaks at 0.25 and approaches zero in both tails, so the chain of gradient multiplications collapses exponentially in deep networks. Tanh's derivative peaks at 1 and is zero-centered, which helps weight update symmetry, but it still saturates at large magnitudes and the gradient still shrinks to near-zero in both tails.
GELU (Gaussian Error Linear Unit) multiplies the input by the probability that a standard Gaussian random variable is smaller than it, producing a smooth, non-monotonic curve that approximates ReLU but with a stochastic regularization flavor. Transformers favor GELU because the smooth gradient near zero improves optimization in deep attention-based architectures.