How does k-nearest neighbours work, and why is it called a lazy learner?

KNN stores the entire training set and defers all computation to prediction time: for a new point it finds the k closest training examples by distance, then returns the majority class (classification) or mean value (regression). It is called lazy because there is no training phase — the model is the data itself.

Why is KNN called a lazy learner, and what are the practical tradeoffs at inference time?

KNN is lazy because it does no real training; it just stores the training data and defers all computation to prediction time, when it searches for the nearest neighbors. The tradeoff is fast (zero) training but slow, memory-heavy inference that scales with dataset size. Approximate nearest-neighbor indexes and dimensionality reduction make it practical at scale.

What's the difference between k-means and k-nearest neighbors? People confuse them.

K-means is an unsupervised clustering algorithm that partitions unlabeled data into k groups by iteratively updating centroids. KNN is a supervised algorithm that classifies or predicts a new point using the labels of its k closest training points. They share the letter k and the use of distances but solve completely different problems.

In KNN, how do you choose k, and how does the curse of dimensionality affect it?

Choose k by cross-validation, balancing a small k (low bias, high variance, noisy) against a large k (smoother, higher bias); an odd k avoids ties in binary classification. The curse of dimensionality hurts KNN because in high dimensions all points become nearly equidistant, so 'nearest' loses meaning and accuracy degrades. Feature scaling and dimensionality reduction help.

k-Nearest Neighbours — GATE DA

What you'll learn

kNN is a lazy learner — it does no training, it just stores the data

Classify by finding the k nearest neighbours (Euclidean / Manhattan) and taking a majority vote

Small k is flexible (low bias, high variance); large k is smooth (high bias, low variance)

Feature scaling matters because distances do

Last lesson left logistic regression boxed in by its own strength: one straight boundary, fixed once, helpless against classes that coil around each other. So swing to the opposite extreme — a classifier that learns no boundary at all, and instead decides each new case by simply asking its neighbours.

To label a new point, look at the points nearest to it and copy the majority. If your five nearest neighbours are three cats and two dogs, you call it a cat. That is the whole algorithm: you are like your neighbours. And what makes it strange is that there is no training. A linear model fits weights; a tree builds splits; this one does nothing up front — it just stores the training data and does all the work at prediction time, measuring distances to find who is near. For that it is called a lazy learner — flexible enough to trace any curvy boundary the data demands, at the price of slow predictions on big datasets.

Find the neighbours, take a vote

Two ingredients, and only two: a distance to decide who counts as near, and a vote to combine their labels.

With k = 3, the query’s three nearest points vote 2-to-1 for the positive class. Faraway points are ignored.

Distance. The usual choices are Euclidean (straight-line, √Σ(aᵢ − bᵢ)²) and Manhattan (sum of absolute coordinate differences, Σ|aᵢ − bᵢ|). For two 2-D points a and b:

Euclidean:  d = √[ (a₁−b₁)² + (a₂−b₂)² ]
Manhattan:  d = |a₁−b₁| + |a₂−b₂|

Vote. For classification, take the majority label among the k neighbours. For regression, average their values instead.

k controls the bias–variance trade-off

The only real knob is k, and — picking up exactly the dial the last lesson promised — it sets how wiggly the decision boundary is, sliding the model right along the bias-variance curve.

Small k (say k = 1): the prediction follows the single nearest point, so the boundary is jagged and chases noise — low bias, high variance.
Large k: the vote averages over many points, smoothing the boundary — higher bias, lower variance. Push k to the size of the dataset and every query just returns the overall majority class.

k is the smoothing dial: turn it up to reduce variance at the cost of bias.

How GATE asks this

Typically an MCQ or MSQ on the properties: identify kNN as a lazy (instance-based, non-parametric — it learns no fixed set of weights, it just keeps the data) learner with no training phase; state the effect of increasing k (smoother boundary, more bias, less variance); or choose the right distance metric. NAT versions hand you a tiny labelled set and a query point and ask you to compute a distance or name the predicted class.

Worked example — classify by the 3 nearest neighbours

Training points: A = (1, 2) is class +, B = (2, 3) is class +, C = (5, 5) is class −, D = (6, 4) is class −. Classify the query q = (3, 3) using k = 3 and Euclidean distance.

Compute each distance from q = (3, 3). We can compare squared distances — the ordering is identical, and it spares us the square roots:

d(q,A)² = (3−1)² + (3−2)² = 4 + 1 = 5     → d ≈ 2.24
d(q,B)² = (3−2)² + (3−3)² = 1 + 0 = 1     → d = 1.00
d(q,C)² = (3−5)² + (3−5)² = 4 + 4 = 8     → d ≈ 2.83
d(q,D)² = (3−6)² + (3−4)² = 9 + 1 = 10    → d ≈ 3.16

Sort by distance: B (1.00) < A (2.24) < C (2.83) < D (3.16). The 3 nearest are B(+), A(+), C(−). Vote: 2 (+) vs 1 (−) → predict + — and q did indeed sit nearer the lower-left + cluster, just as the eyeball suggested.

(Sanity check: with k = 1 we would use only B, also +. With k = 5 we would be forced to look beyond the four points we have — k must never exceed the dataset size.)

In one breath

k-Nearest Neighbours is a lazy classifier that does no training — it just stores the data, then labels a new point by the majority vote of its k closest neighbours under a distance (Euclidean √Σ(aᵢ−bᵢ)² or Manhattan Σ|aᵢ−bᵢ|); the single knob k is the smoothing dial on the bias-variance curve (small k → jagged, low bias / high variance; large k → smooth, high bias / low variance), and because everything rides on distance you must scale your features first and prefer an odd k in two-class problems to avoid tied votes.

Practice

Quick check

0/6

Q1Recall — Which statements about kNN are TRUE? (select all that apply)select all that apply

Q2Recall — As k increases (up to the dataset size), what happens to a kNN classifier?

Q3Recall — Why is an odd value of k often preferred for binary (two-class) classification?

Q4Trace — Compute the Manhattan distance between a = (1, 2) and b = (4, 6).numerical answer — type a number

Q5Trace — For the query q=(3,3) and point A=(1,2), what is the squared Euclidean distance d(q,A)²?numerical answer — type a number

Q6Apply — Using the lesson's training set — A=(1,2):+, B=(2,3):+, C=(5,5):−, D=(6,4):− — classify q=(3,3) with k=1 (Euclidean). Which class? (enter 1 for +, 0 for −)numerical answer — type a number

A question to carry forward

kNN decides everything by geometry — who sits closest — and pays for it twice: it must hoard every training point forever, and it offers no sense of how sure it is, only a raw vote. Two cats and a dog says “cat,” but with what confidence? It cannot say.

So picture a wholly different way to classify, one that speaks in probabilities from the start. Instead of measuring distance, ask of each class a question: if the true class were this, how likely is the evidence I’m seeing? Score every class by that likelihood times how common the class is, and pick the winner. That is Bayes’ rule turned into a classifier — compact, fast, no points to store. Here is the thread onward: when the evidence is many features at once, computing their joint likelihood exactly is hopeless — so what single bold shortcut makes the probabilities cheap, and how badly does the shortcut have to be wrong before the answers go wrong with it?

k-Nearest Neighbours

What you'll learn

Before you start

Find the neighbours, take a vote

k controls the bias–variance trade-off

How GATE asks this

Worked example — classify by the 3 nearest neighbours

In one breath

Practice

Quick check

A question to carry forward

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further