datarekha

Backpropagation (One Step)

Backprop is the chain rule walked backward over a computation graph. GATE asks for one partial derivative through a small net — here is the exact recipe.

8 min read Advanced GATE DA Lesson 93 of 122

What you'll learn

  • Backprop is the chain rule applied over a computation graph, right to left
  • For one path, ∂L/∂w is the product of local derivatives along that path
  • ReLU's local derivative is 1 when its input was positive, else 0
  • Computing one partial derivative through a tiny 2-layer net by hand

Before you start

Training a network means nudging every weight in the direction that lowers the loss — which needs ∂L/∂w for each weight w. Backpropagation computes those gradients efficiently by treating the network as a computation graph and applying the chain rule backward, layer by layer, from the loss toward the inputs.

The chain rule along a path

The forward pass sends values left to right. Backprop sends gradients right to left: each edge carries a local derivative, and the gradient of the loss with respect to any quantity is the product of those local derivatives along the path back to it.

xhyLh = ReLU(w₁·x)y = w₂·hloss∂·/∂w₁: x∂y/∂h = w₂Multiply local derivatives along the path: that is backprop for one weight.
Forward arrows build values; the gradient to a weight is the product of local derivatives back along its path.

For a weight w on one path to the loss:

∂L/∂w = (∂L/∂output) · (∂output/∂hidden) · … · (∂·/∂w)

Each factor is a local derivative — the derivative of one node with respect to its immediate input. The only non-obvious one here is the activation: ReLU’s local derivative is 1 if its input was positive, else 0. It acts as a gate that either passes the upstream gradient through unchanged or blocks it.

Play with the mechanism on a single neuron — run the forward pass to fill each value, then the backward pass to watch every gradient form as downstream gradient × the local derivative on the edge:

How GATE asks this

A NAT or MCQ: a tiny network is specified with concrete weights and an input, and you compute one partial derivative (often ∂y/∂w or ∂L/∂w) by the chain rule. The graph is small enough to trace by hand — the skill being tested is identifying the path and multiplying the local derivatives, plus remembering the ReLU gate. This pattern appeared on GATE DA 2025.

Worked example — one chain-rule step

A tiny net: input x = 2. Hidden unit h = ReLU(w₁·x) with w₁ = 1, so h = ReLU(2) = 2. Output y = w₂·h with w₂ = 3, so y = 6. Compute ∂y/∂w₁ by the chain rule.

The path from w₁ to y is w₁ → (w₁x) → h → y. Multiply the local derivative on each edge:

∂y/∂h        = w2 = 3                 (since y = w2·h)
∂h/∂(w1·x)   = 1                      (ReLU input is 2 > 0, so gate = 1)
∂(w1·x)/∂w1  = x = 2                  (linear in w1)

∂y/∂w1 = (∂y/∂h) · (∂h/∂(w1·x)) · (∂(w1·x)/∂w1)
       = 3 · 1 · 2
       = 6

So ∂y/∂w₁ = 6. Notice the ReLU gate was open (input 2 > 0), so it contributed a factor of 1 and simply passed the gradient through. Had the ReLU input been negative, the gate would be 0 and the whole gradient would vanish.

Quick check

Quick check

0/6
Q1A net: x = 3, h = ReLU(w₁·x) with w₁ = 2 (so h = ReLU(6) = 6), y = w₂·h with w₂ = 4. Compute ∂y/∂w₁.numerical answer — type a number
Q2Same architecture, but now the ReLU input is negative: x = −1, w₁ = 2 (pre-activation = −2), y = w₂·h with w₂ = 5. Compute ∂y/∂w₁.numerical answer — type a number
Q3For y = w₂·h with w₂ = 7, what is the local derivative ∂y/∂h?numerical answer — type a number
Q4Which statements about a single backprop step are TRUE? (select all that apply)select all that apply
Q5A net: x = 1, h = ReLU(w₁·x) with w₁ = 4 (h = 4), y = w₂·h with w₂ = 2 (y = 8). Compute ∂y/∂w₂.numerical answer — type a number
Q6In the net x → (w₁·x) → ReLU → h → (w₂·h) → y, why can ∂y/∂w₁ be exactly 0 even when w₂ ≠ 0 and x ≠ 0?

Practice this in an interview

All questions

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content