What does a single artificial neuron (perceptron) actually compute?

A neuron takes a weighted sum of its inputs, adds a bias, and passes the result through an activation function. The weights encode learned feature importance, the bias shifts the decision boundary, and the activation introduces the non-linearity needed for complex mappings.

Walk me through the forward pass of a neural network end-to-end.

The forward pass feeds an input through every layer in sequence: each layer computes a linear transform followed by an activation, caching the intermediate values needed later for backpropagation. The final layer produces a prediction, which is compared to the label via a loss function.

What is backpropagation and how does the chain rule make it work?

Backpropagation is the algorithm that computes the gradient of the loss with respect to every parameter by applying the chain rule layer by layer in reverse. It turns a single backward pass through the computation graph into exact gradients for all weights simultaneously.

How does dropout work, and why must it behave differently during training and inference?

Dropout randomly zeroes each neuron's output with probability p during training, forcing the network to learn redundant representations and preventing co-adaptation of neurons. At inference, dropout is disabled and all neurons are active — but to keep expected activations the same as during training, outputs are scaled by 1/(1−p). Forgetting to switch modes produces incorrect, noisy predictions.

Perceptron & the Update Rule — GATE DA

What you'll learn

The perceptron predicts the sign of the linear score: ŷ = sign(wᵀx + b)

It learns by the update rule w ← w + η(y − ŷ)x, applied only to misclassified points

Each update rotates the decision boundary toward classifying the missed point correctly

It converges only if the data is linearly separable — a single layer cannot solve XOR

Last lesson asked for the simplest unit that learns in the plainest sense — watches its own mistakes and corrects. Here it is, and it happens to be the original artificial neuron, the ancestor of every neural network. Its prediction is brutally simple: compute the linear score z = wᵀx + b, then output its sign. If z is positive, predict +1; if negative, predict −1. The boundary z = 0 is a line, just as in logistic regression — but instead of a smooth probability the perceptron commits to a hard ±1.

What made it historic is how it learns. No calculus, no probability, no loss to descend — just a tiny correction applied every time it gets a point wrong, repeated until it stops making mistakes, the way you twitch a steering wheel back each time the car drifts off its lane.

Predict with a sign, learn from mistakes

The prediction:

ŷ = sign(wᵀx + b)   →   +1 if wᵀx + b > 0,   −1 otherwise

The learning rule walks through the training points. When the prediction ŷ matches the true label y, do nothing. When it is wrong, nudge the weights:

The error term (y − ŷ) is zero on correct points, so only mistakes change the weights.

The key is the error term (y − ŷ). When the prediction is right, y − ŷ = 0 and the weights do not move. When it is wrong — say y = +1 but ŷ = −1 — y − ŷ = +2, so we add a multiple of x to w. That pushes the score wᵀx up for this exact point, dragging it toward the positive side. Geometrically, each update rotates the decision boundary toward correctly classifying the point it just missed.

The positive point sat on the wrong side of the solid line; after one update the dashed boundary has rotated so the point is now classified +.

This repeats over the data, pass after pass. The Perceptron Convergence Theorem guarantees the process halts with zero errors — but only if the classes are linearly separable. If no straight line can split them, the perceptron never settles; some point is always wrong, so the weights never stop moving.

How GATE asks this

Almost always an MCQ or NAT asking for the effect of a single update: given w, b, a misclassified point x, its true label y, and the learning rate η, compute the new weights or show that the score wᵀx moves toward the correct side. This single-neuron update is also the building block behind the neural-network questions GATE DA has asked every year (2024–2026). A conceptual variant asks why a single-layer perceptron cannot learn XOR — the answer being that XOR is not linearly separable.

Worked example — one update step

Current weights w = (1, 0) with bias b = −3. A point x = (2, 1) has true label y = +1. With learning rate η = 1, perform one update and check that wᵀx improves.

First confirm it is a mistake. The score is z = wᵀx + b = (1)(2) + (0)(1) − 3 = 2 − 3 = −1, which is negative, so the perceptron predicts ŷ = −1 — wrong, since the true label is +1. Apply the update. The error term is y − ŷ = +1 − (−1) = 2:

w_new = w + η·(y − ŷ)·x
      = (1, 0) + 1 · 2 · (2, 1)
      = (1, 0) + (4, 2)
      = (5, 2)

Now recompute the weighted score wᵀx for the same point with the new weights:

old wᵀx = (1)(2) + (0)(1) = 2
new wᵀx = (5)(2) + (2)(1) = 10 + 2 = 12

The score jumped from 2 to 12 — far more strongly positive, exactly as the prediction prompt anticipated. The weight vector grew in the direction of x, which is what rotates the boundary toward the point it had been missing.

In one breath

The perceptron — the original neuron — predicts ŷ = sign(wᵀx + b), a hard ±1, and learns with no calculus by the rule w ← w + η(y − ŷ)x applied only to misclassified points (the error term y − ŷ is 0 when correct), each correction rotating the boundary toward the point it missed; the Convergence Theorem guarantees it halts with zero errors iff the data is linearly separable, which is also its fatal limit — it can never learn XOR, because no single line splits it.

Practice

Quick check

0/6

Q1Recall — Which statements about the single-layer perceptron are TRUE? (select all that apply)select all that apply

Q2Recall — When the perceptron's prediction ŷ already equals the true label y, what does the update rule w ← w + η(y − ŷ)x do?

Q3Recall — A perceptron is trained on data that is NOT linearly separable. What happens?

Q4Trace — Weights w = (1, 0), learning rate η = 1. A misclassified point x = (2, 1) has true label y = +1 and prediction ŷ = −1. After one update, what is the first component of the new weight vector w_new?numerical answer — type a number

Q5Trace — Continuing the example, after the update to w = (5, 2), what is the new score wᵀx for the same point x = (2, 1)?numerical answer — type a number

Q6Apply — Weights w = (0, 1), η = 1. A point x = (3, −2) with true label y = +1 is misclassified as ŷ = −1. What is the SECOND component of w_new?numerical answer — type a number

A question to carry forward

The perceptron learns — and then slams into a wall named XOR. Four points, two classes, and not one straight line on Earth can separate them. The single neuron loops forever, defeated by a problem a child solves at a glance.

The fix is the idea that launched deep learning: stop asking one neuron to do everything. Feed the outputs of several perceptrons into another perceptron — a hidden layer between input and output — and suddenly curved, XOR-shaped regions come within reach. But the hard sign step has to go first: it is flat almost everywhere, so it tells a deeper network nothing about how to improve. Here is the thread onward: how does stacking neurons into layers buy the power to carve any region, what smooth activation must replace the sign so the stack can be trained, and what does a multi-layer network actually compute as a signal flows through it?

Perceptron & the Update Rule

What you'll learn

Before you start

Predict with a sign, learn from mistakes

How GATE asks this

Worked example — one update step

In one breath

Practice

Quick check

A question to carry forward

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further