K-nearest neighbors
The simplest learner there is — no training, just store the data and let a new point's nearest neighbors vote. How k controls the bias-variance tradeoff, and why distance and scale matter.
What you'll learn
- How k-NN classifies by a majority vote of nearest neighbors (no training)
- Why k is a bias-variance dial — small k overfits, large k underfits
- Why you must scale features, and where k-NN struggles
Before you start
Every other model learns a function. k-nearest neighbors learns nothing — it
just memorizes the training set and, when a new point arrives, looks at its k
closest neighbors and takes a majority vote. It’s the simplest classifier in ML,
and a perfect vehicle for seeing decision boundaries and the bias-variance
tradeoff in action.
Store, then vote
There is no training phase. To predict a new point: compute its distance to every
training point, take the k nearest, and return their majority class (for
regression, their average). That’s it.
Move the cursor and slide k — watch the neighbors light up and the boundary
reshape:
k is the bias-variance dial
The choice of k is the whole game, and it maps directly onto
bias and variance:
- Small k (k=1) — the boundary wraps every single point. Zero training error, but it memorizes noise — high variance, overfit, jagged.
- Large k — each prediction averages over many points, smoothing the boundary. Too large and far-away points outvote the locals — high bias, underfit.
You tune k by cross-validation, and an odd k avoids
ties in binary classification.
Quick check
Quick check
Next
k-NN votes by distance; Naive Bayes votes by probability — a fast, surprisingly strong probabilistic baseline, especially for text.
Practice this in an interview
All questionsKNN stores the entire training set and defers all computation to prediction time: for a new point it finds the k closest training examples by distance, then returns the majority class (classification) or mean value (regression). It is called lazy because there is no training phase — the model is the data itself.
KNN is lazy because it does no real training; it just stores the training data and defers all computation to prediction time, when it searches for the nearest neighbors. The tradeoff is fast (zero) training but slow, memory-heavy inference that scales with dataset size. Approximate nearest-neighbor indexes and dimensionality reduction make it practical at scale.
Choose k by cross-validation, balancing a small k (low bias, high variance, noisy) against a large k (smoother, higher bias); an odd k avoids ties in binary classification. The curse of dimensionality hurts KNN because in high dimensions all points become nearly equidistant, so 'nearest' loses meaning and accuracy degrades. Feature scaling and dimensionality reduction help.
K-means is an unsupervised clustering algorithm that partitions unlabeled data into k groups by iteratively updating centroids. KNN is a supervised algorithm that classifies or predicts a new point using the labels of its k closest training points. They share the letter k and the use of distances but solve completely different problems.