datarekha

k-Nearest Neighbours

A lazy learner with no training step: classify a point by the majority vote of its k closest neighbours, where k trades bias against variance.

6 min read Beginner GATE DA Lesson 86 of 122

What you'll learn

  • kNN is a lazy learner — it does no training, it just stores the data
  • Classify by finding the k nearest neighbours (Euclidean / Manhattan) and taking a majority vote
  • Small k is flexible (low bias, high variance); large k is smooth (high bias, low variance)
  • Feature scaling matters because distances do

Before you start

k-Nearest Neighbours is the most intuitive classifier there is: to label a new point, look at the points nearest to it and copy the majority. If your k = 5 nearest neighbours are 3 cats and 2 dogs, you call it a cat. That is the entire algorithm.

What makes it unusual is that there is no training. A linear model fits weights; a tree builds splits. kNN does nothing up front — it simply stores the training data. All the work happens at prediction time, when it measures distances to find the neighbours. That is why kNN is called a lazy learner (and why predictions can be slow on big datasets).

Find the neighbours, take a vote

Two ingredients: a distance to decide who is “near,” and a vote to combine their labels.

the k = 3 circle catches the 3 closest points● class +● class −◆ queryvote: 2 (+) vs 1 (−) → predict +
With k = 3, the query’s three nearest points vote 2-to-1 for the positive class. Faraway points are ignored.

Distance. The usual choices are Euclidean (straight-line, √Σ(aᵢ − bᵢ)²) and Manhattan (sum of absolute coordinate differences, Σ|aᵢ − bᵢ|). For two 2-D points a and b:

Euclidean:  d = √[ (a₁−b₁)² + (a₂−b₂)² ]
Manhattan:  d = |a₁−b₁| + |a₂−b₂|

Vote. For classification, take the majority label among the k neighbours. For regression, average their values instead.

k controls the bias–variance trade-off

The only real knob is k, and it directly sets how wiggly the decision boundary is:

  • Small k (e.g. k = 1): the prediction follows the nearest single point, so the boundary is jagged and chases noise — low bias, high variance.
  • Large k: the vote averages over many points, smoothing the boundary — higher bias, lower variance. Push k to the size of the dataset and every query just returns the overall majority class.
small klarge kjagged boundarylow bias, high variancesmooth boundaryhigh bias, low variance
k is the smoothing dial: turn it up to reduce variance at the cost of bias.

How GATE asks this

Typically an MCQ or MSQ on the properties: identify kNN as a lazy (instance-based, non-parametric — it learns no fixed set of weights, it just keeps the data) learner with no training phase; state the effect of increasing k (smoother boundary, more bias, less variance); or choose the right distance metric. NAT versions hand you a tiny labelled set and a query point and ask you to compute a distance or name the predicted class.

Worked example — classify by the 3 nearest neighbours

Training points: A = (1, 2) is class +, B = (2, 3) is class +, C = (5, 5) is class , D = (6, 4) is class . Classify the query q = (3, 3) using k = 3 and Euclidean distance.

Compute each distance from q = (3, 3) (we can compare squared distances — the ordering is the same, and it avoids square roots):

d(q,A)² = (3−1)² + (3−2)² = 4 + 1 = 5     → d ≈ 2.24
d(q,B)² = (3−2)² + (3−3)² = 1 + 0 = 1     → d = 1.00
d(q,C)² = (3−5)² + (3−5)² = 4 + 4 = 8     → d ≈ 2.83
d(q,D)² = (3−6)² + (3−4)² = 9 + 1 = 10    → d ≈ 3.16

Sort by distance: B (1.00) < A (2.24) < C (2.83) < D (3.16). The 3 nearest are B(+), A(+), C(−). Vote: 2 (+) vs 1 (−) → predict +.

(Sanity check: with k = 1 we’d use only B, also +. With k = 5 we’d be forced to look beyond the four points we have — k must not exceed the dataset size.)

Quick check

Quick check

0/6
Q1Compute the Manhattan distance between a = (1, 2) and b = (4, 6).numerical answer — type a number
Q2Using the lesson's training set — A=(1,2):+, B=(2,3):+, C=(5,5):−, D=(6,4):− — classify q=(3,3) with k=1 (Euclidean). Which class? (enter 1 for +, 0 for −)numerical answer — type a number
Q3As k increases (up to the dataset size), what happens to a kNN classifier?
Q4Which statements about kNN are TRUE? (select all that apply)select all that apply
Q5Why is an odd value of k often preferred for binary (two-class) classification?
Q6For the query q=(3,3) and points A=(1,2), C=(5,5), what are the squared Euclidean distances d(q,A)² and d(q,C)²? Enter d(q,A)². numerical answer — type a number

Practice this in an interview

All questions
How does k-nearest neighbours work, and why is it called a lazy learner?

KNN stores the entire training set and defers all computation to prediction time: for a new point it finds the k closest training examples by distance, then returns the majority class (classification) or mean value (regression). It is called lazy because there is no training phase — the model is the data itself.

How does k-means clustering work?

K-means partitions n points into k clusters by alternating between two steps: assigning each point to its nearest centroid, then recomputing each centroid as the mean of its assigned points. It repeats until assignments stop changing, which guarantees convergence but not a globally optimal solution.

How does the curse of dimensionality affect KNN?

In high-dimensional spaces all pairwise distances concentrate around the same value, so the concept of a 'nearest' neighbour breaks down — the k-th nearest neighbour is almost as far as every other point. KNN's accuracy degrades sharply as dimensionality increases unless the data has much lower intrinsic dimensionality.

What is k-means++ and why is it better than random initialisation?

K-means++ initialises centroids by probabilistically spacing them apart: the first centroid is chosen uniformly at random, and each subsequent centroid is chosen with probability proportional to its squared distance from the nearest already-chosen centroid. This reduces the chance of bad starts, cuts the number of iterations to convergence, and provides an O(log k) approximation guarantee on the final inertia.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content