What's the difference between k-means and k-nearest neighbors? People confuse them.

For Data Scientist Data Analyst ML Engineer

The short answer

K-means is an unsupervised clustering algorithm that partitions unlabeled data into k groups by iteratively updating centroids. KNN is a supervised algorithm that classifies or predicts a new point using the labels of its k closest training points. They share the letter k and the use of distances but solve completely different problems.

How to think about it

The crisp answer

They only share the letter k. K-means is unsupervised: it groups unlabeled data into k clusters. KNN is supervised: it predicts the label of a new point from the labels of its k nearest neighbors in the training set.

Why the confusion

Both use a distance metric and both have a hyperparameter called k, but k means different things. In k-means, k is the number of clusters you’re partitioning into. In KNN, k is how many neighbors vote on a prediction. The KNN vs K-means comparison is a frequent interview clarifier.

How each works

K-means: initialize k centroids, assign each point to the nearest centroid, recompute centroids as cluster means, repeat until stable. There’s a training phase that produces centroids.
KNN: no real training — it’s a lazy learner that stores the data. At prediction time it finds the k closest training points and takes a majority vote (classification) or average (regression).

Concrete example

Segmenting customers into 4 groups with no labels → k-means. Predicting whether a new customer churns based on the 5 most similar past customers → KNN.

The common trap

Saying KNN “trains” a model — it doesn’t; cost is at inference and scales with dataset size, which is its main weakness. And k-means doesn’t classify new points into known categories; it discovers structure. Both need feature scaling. Follow-up: “Which is lazy and which is eager?” — KNN is lazy (defers computation to query time), k-means does upfront work to learn centroids.

Learn it properly K-means clustering

What's the difference between k-means and k-nearest neighbors? People confuse them.

The crisp answer

Why the confusion

How each works

Concrete example

The common trap

Keep practising

Explore further