datarekha

K-nearest neighbors

The simplest learner there is — no training, just store the data and let a new point's nearest neighbors vote. How k controls the bias-variance tradeoff, and why distance and scale matter.

6 min read Beginner Machine Learning Lesson 10 of 33

What you'll learn

  • How k-NN classifies by a majority vote of nearest neighbors (no training)
  • Why k is a bias-variance dial — small k overfits, large k underfits
  • Why you must scale features, and where k-NN struggles

Before you start

Every other model learns a function. k-nearest neighbors learns nothing — it just memorizes the training set and, when a new point arrives, looks at its k closest neighbors and takes a majority vote. It’s the simplest classifier in ML, and a perfect vehicle for seeing decision boundaries and the bias-variance tradeoff in action.

Store, then vote

There is no training phase. To predict a new point: compute its distance to every training point, take the k nearest, and return their majority class (for regression, their average). That’s it.

Move the cursor and slide k — watch the neighbors light up and the boundary reshape:

k is the bias-variance dial

The choice of k is the whole game, and it maps directly onto bias and variance:

  • Small k (k=1) — the boundary wraps every single point. Zero training error, but it memorizes noise — high variance, overfit, jagged.
  • Large k — each prediction averages over many points, smoothing the boundary. Too large and far-away points outvote the locals — high bias, underfit.

You tune k by cross-validation, and an odd k avoids ties in binary classification.

Quick check

Quick check

0/3
Q1What happens during k-NN 'training'?
Q2You set k=1 and get perfect training accuracy but poor test accuracy. Why?
Q3Why must you scale features before using k-NN?

Next

k-NN votes by distance; Naive Bayes votes by probability — a fast, surprisingly strong probabilistic baseline, especially for text.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Practice this in an interview

All questions
How does k-nearest neighbours work, and why is it called a lazy learner?

KNN stores the entire training set and defers all computation to prediction time: for a new point it finds the k closest training examples by distance, then returns the majority class (classification) or mean value (regression). It is called lazy because there is no training phase — the model is the data itself.

Why is KNN called a lazy learner, and what are the practical tradeoffs at inference time?

KNN is lazy because it does no real training; it just stores the training data and defers all computation to prediction time, when it searches for the nearest neighbors. The tradeoff is fast (zero) training but slow, memory-heavy inference that scales with dataset size. Approximate nearest-neighbor indexes and dimensionality reduction make it practical at scale.

In KNN, how do you choose k, and how does the curse of dimensionality affect it?

Choose k by cross-validation, balancing a small k (low bias, high variance, noisy) against a large k (smoother, higher bias); an odd k avoids ties in binary classification. The curse of dimensionality hurts KNN because in high dimensions all points become nearly equidistant, so 'nearest' loses meaning and accuracy degrades. Feature scaling and dimensionality reduction help.

What's the difference between k-means and k-nearest neighbors? People confuse them.

K-means is an unsupervised clustering algorithm that partitions unlabeled data into k groups by iteratively updating centroids. KNN is a supervised algorithm that classifies or predicts a new point using the labels of its k closest training points. They share the letter k and the use of distances but solve completely different problems.

Related lessons

Explore further

Skip to content