datarekha

Lipschitz & One-Insight Problems

Some GATE calculus problems look brutal but turn on a single idea. The signature one: a squared bound |f(x) − f(y)| ≤ C(x − y)² secretly forces f to be constant.

7 min read Advanced GATE DA Lesson 46 of 122

What you'll learn

  • How to spot a 'looks hard, one idea' calculus problem on the exam
  • A squared bound |f(x) − f(y)| ≤ C(x − y)² divides down to force f'(x) = 0 everywhere
  • f' ≡ 0 on an interval ⇒ f is constant, so any difference f(b) − f(a) = 0
  • Why the SQUARE matters: a linear bound |f(x) − f(y)| ≤ C|x − y| only bounds the slope, it does NOT force constancy

Before you start

Every year, GATE slips in a calculus question that looks brutal on first read. An abstract function f, no formula, a strange-looking inequality, and a request for something specific. There’s nothing to differentiate, nothing to plug in. It seems unfair.

It isn’t. These problems are one-trick ponies. Each one hinges on a single insight that, once you see it, collapses the whole thing to a one-line answer. This lesson teaches the most famous example — and, more usefully, how to spot the species.

The star concept here, Lipschitz bounds (a cap on how fast a function can change), is not just exam trivia: bounding a function’s or gradient’s Lipschitz constant is how modern ML keeps training stable — it underpins WGAN’s weight clipping, certified robustness bounds, and convergence guarantees for gradient descent.

The signature problem: a squared bound

Here is the setup that has appeared more than once. You are told a function satisfies

|f(x) − f(y)| ≤ C · (x − y)²      for all real x, y

for some constant C. There is no formula for f. The question asks for something like f(1) − f(0). It looks underspecified — but the squared (x − y)² is doing all the work.

The one idea: divide, then take the limit

The derivative of f at a point x is the limit of the difference quotient. So look at that quotient and feed it the bound. For y ≠ x, divide both sides by |x − y|:

| f(x) − f(y) |
| ----------- |  ≤  C · |x − y|
|    x − y    |

The left side is the difference quotient whose limit (as y → x) is |f'(x)|. The right side C·|x − y| goes to 0 as y → x. Squeezed between, the slope has nowhere to go:

0 ≤ |f'(x)| ≤ lim (C·|x − y|) = 0      ⇒      f'(x) = 0   for every x

A function whose derivative is zero everywhere is constant. So f is the same value at every point, and therefore any difference vanishes:

f'(x) = 0 everywhere   ⇒   f is constant   ⇒   f(1) − f(0) = 0
slope 0 everywhere → f is constantx = 0x = 1f(1) − f(0) = 0
A zero-slope function is a flat line — every value equals every other, so all differences are 0.

How GATE asks this

Almost always a NAT whose answer is a clean 0. The wrapper changes — it might ask for f(2) − f(−3), or “how many such non-constant f exist?”, or give the bound as |f(x) − f(y)| ≤ 5(x − y)² — but the engine is identical: divide by |x − y|, let y → x, conclude f' ≡ 0, so f is constant and the difference is 0. The skill GATE is testing is recognition: an abstract f plus a squared (or higher power) bound is the tell.

Worked example — a real GATE DA question

A function f satisfies |f(x) − f(y)| ≤ (x − y)² for all real x, y. Find f(1) − f(0).

Run the pattern. Fix x, take any y ≠ x, and divide the given bound by |x − y|:

| f(x) − f(y) |
| ----------- |  ≤  |x − y|
|    x − y    |

Now let y → x. The left side tends to |f'(x)| (the definition of the derivative), and the right side |x − y| → 0. By the squeeze,

|f'(x)| ≤ 0   ⇒   f'(x) = 0      for every x.

So f has zero slope everywhere and is therefore constant. A constant function takes the same value at 1 and at 0, hence

f(1) − f(0) = 0.

The answer is 0. This is a real GATE DA NAT (2024/2025 style) — daunting on first read, a single line once you spot that the squared bound kills the derivative.

Quick check

Quick check

0/6
Q1f satisfies |f(x) − f(y)| ≤ (x − y)² for all real x, y. Find f(1) − f(0). (NAT)numerical answer — type a number
Q2f satisfies |f(x) − f(y)| ≤ 7(x − y)² for all x, y. Find f(2) − f(−3). (NAT)numerical answer — type a number
Q3Why does the bound |f(x) − f(y)| ≤ C(x − y)² force f to be constant? Which statements are correct? (select all that apply)select all that apply
Q4A function g satisfies |g(x) − g(y)| ≤ 3|x − y| for all x, y. Which statements are correct? (select all that apply)select all that apply
Q5How many NON-constant functions f satisfy |f(x) − f(y)| ≤ (x − y)² for all real x, y? (NAT)numerical answer — type a number
Q6Which feature of the bound |f(x) − f(y)| ≤ C(x − y)^p is responsible for forcing f to be constant?

Practice this in an interview

All questions
Why does sigmoid saturation cause vanishing gradients, and why is tanh only a partial fix?

Sigmoid's derivative peaks at 0.25 and approaches zero in both tails, so the chain of gradient multiplications collapses exponentially in deep networks. Tanh's derivative peaks at 1 and is zero-centered, which helps weight update symmetry, but it still saturates at large magnitudes and the gradient still shrinks to near-zero in both tails.

What does the Universal Approximation Theorem guarantee — and what doesn't it guarantee?

The theorem proves that a single-hidden-layer network with enough neurons and a non-linear activation can approximate any continuous function on a compact domain to arbitrary precision. It guarantees existence, not learnability — it says nothing about how many neurons are needed, whether gradient descent will find the solution, or how the network will generalize.

Compare sigmoid, tanh, ReLU, leaky ReLU, and GELU — when would you pick each?

Sigmoid squashes to (0,1) and saturates at extremes, causing vanishing gradients. Tanh is zero-centered but still saturates. ReLU avoids saturation for positive inputs and trains fast but can produce dead neurons. Leaky ReLU fixes dying neurons. GELU is smooth and probabilistic, now the default in most transformer architectures.

How do LSTM gates solve the vanishing gradient problem?

An LSTM maintains a cell state that flows through time via additive updates controlled by learned gates, giving gradients a near-linear path across many steps. The forget, input, and output gates let the network selectively retain, write, and expose information rather than crushing every signal through a squashing non-linearity at every step.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content