Lipschitz & One-Insight Problems
Some GATE calculus problems look brutal but turn on a single idea. The signature one: a squared bound |f(x) − f(y)| ≤ C(x − y)² secretly forces f to be constant.
What you'll learn
- How to spot a 'looks hard, one idea' calculus problem on the exam
- A squared bound |f(x) − f(y)| ≤ C(x − y)² divides down to force f'(x) = 0 everywhere
- f' ≡ 0 on an interval ⇒ f is constant, so any difference f(b) − f(a) = 0
- Why the SQUARE matters: a linear bound |f(x) − f(y)| ≤ C|x − y| only bounds the slope, it does NOT force constancy
Before you start
Every year, GATE slips in a calculus question that looks brutal on first read. An
abstract function f, no formula, a strange-looking inequality, and a request for
something specific. There’s nothing to differentiate, nothing to plug in. It seems
unfair.
It isn’t. These problems are one-trick ponies. Each one hinges on a single insight that, once you see it, collapses the whole thing to a one-line answer. This lesson teaches the most famous example — and, more usefully, how to spot the species.
The star concept here, Lipschitz bounds (a cap on how fast a function can change), is not just exam trivia: bounding a function’s or gradient’s Lipschitz constant is how modern ML keeps training stable — it underpins WGAN’s weight clipping, certified robustness bounds, and convergence guarantees for gradient descent.
The signature problem: a squared bound
Here is the setup that has appeared more than once. You are told a function satisfies
|f(x) − f(y)| ≤ C · (x − y)² for all real x, y
for some constant C. There is no formula for f. The question asks for something
like f(1) − f(0). It looks underspecified — but the squared (x − y)² is doing all
the work.
The one idea: divide, then take the limit
The derivative of f at a point x is the limit of the difference quotient. So look
at that quotient and feed it the bound. For y ≠ x, divide both sides by |x − y|:
| f(x) − f(y) |
| ----------- | ≤ C · |x − y|
| x − y |
The left side is the difference quotient whose limit (as y → x) is |f'(x)|.
The right side C·|x − y| goes to 0 as y → x. Squeezed between, the slope has
nowhere to go:
0 ≤ |f'(x)| ≤ lim (C·|x − y|) = 0 ⇒ f'(x) = 0 for every x
A function whose derivative is zero everywhere is constant. So f is the same
value at every point, and therefore any difference vanishes:
f'(x) = 0 everywhere ⇒ f is constant ⇒ f(1) − f(0) = 0
How GATE asks this
Almost always a NAT whose answer is a clean 0. The wrapper changes — it might
ask for f(2) − f(−3), or “how many such non-constant f exist?”, or give the bound
as |f(x) − f(y)| ≤ 5(x − y)² — but the engine is identical: divide by |x − y|,
let y → x, conclude f' ≡ 0, so f is constant and the difference is 0. The
skill GATE is testing is recognition: an abstract f plus a squared (or higher
power) bound is the tell.
Worked example — a real GATE DA question
A function
fsatisfies|f(x) − f(y)| ≤ (x − y)²for all realx, y. Findf(1) − f(0).
Run the pattern. Fix x, take any y ≠ x, and divide the given bound by |x − y|:
| f(x) − f(y) |
| ----------- | ≤ |x − y|
| x − y |
Now let y → x. The left side tends to |f'(x)| (the definition of the derivative),
and the right side |x − y| → 0. By the squeeze,
|f'(x)| ≤ 0 ⇒ f'(x) = 0 for every x.
So f has zero slope everywhere and is therefore constant. A constant function
takes the same value at 1 and at 0, hence
f(1) − f(0) = 0.
The answer is 0. This is a real GATE DA NAT (2024/2025 style) — daunting on first read, a single line once you spot that the squared bound kills the derivative.
Quick check
Quick check
Practice this in an interview
All questionsSigmoid's derivative peaks at 0.25 and approaches zero in both tails, so the chain of gradient multiplications collapses exponentially in deep networks. Tanh's derivative peaks at 1 and is zero-centered, which helps weight update symmetry, but it still saturates at large magnitudes and the gradient still shrinks to near-zero in both tails.
The theorem proves that a single-hidden-layer network with enough neurons and a non-linear activation can approximate any continuous function on a compact domain to arbitrary precision. It guarantees existence, not learnability — it says nothing about how many neurons are needed, whether gradient descent will find the solution, or how the network will generalize.
Sigmoid squashes to (0,1) and saturates at extremes, causing vanishing gradients. Tanh is zero-centered but still saturates. ReLU avoids saturation for positive inputs and trains fast but can produce dead neurons. Leaky ReLU fixes dying neurons. GELU is smooth and probabilistic, now the default in most transformer architectures.
An LSTM maintains a cell state that flows through time via additive updates controlled by learned gates, giving gradients a near-linear path across many steps. The forget, input, and output gates let the network selectively retain, write, and expose information rather than crushing every signal through a squashing non-linearity at every step.