How does k-means clustering work?

K-means partitions n points into k clusters by alternating between two steps: assigning each point to its nearest centroid, then recomputing each centroid as the mean of its assigned points. It repeats until assignments stop changing, which guarantees convergence but not a globally optimal solution.

What's the difference between k-means and k-nearest neighbors? People confuse them.

K-means is an unsupervised clustering algorithm that partitions unlabeled data into k groups by iteratively updating centroids. KNN is a supervised algorithm that classifies or predicts a new point using the labels of its k closest training points. They share the letter k and the use of distances but solve completely different problems.

What is k-means++ and why is it better than random initialisation?

K-means++ initialises centroids by probabilistically spacing them apart: the first centroid is chosen uniformly at random, and each subsequent centroid is chosen with probability proportional to its squared distance from the nearest already-chosen centroid. This reduces the chance of bad starts, cuts the number of iterations to convergence, and provides an O(log k) approximation guarantee on the final inertia.

What are the main limitations of k-means clustering?

K-means requires specifying k upfront, assumes clusters are convex and roughly equal in size and density, is sensitive to outliers and feature scale, and can converge to local minima. It struggles with non-globular shapes such as rings or crescents, and it assigns every point to exactly one cluster with no notion of uncertainty.

k-means & k-medoid Clustering — GATE DA

What you'll learn

k-means alternates two steps: assign points to the nearest centroid, then update each centroid to the mean of its points

It minimises the within-cluster sum of squared distances (WCSS)

It converges only to a local optimum, so the result depends on initialisation

k-medoid uses an actual data point as the centre, making it more robust to outliers

Last lesson took away the answer key. No more labels — just points scattered in space and a hunch that they fall into natural groups. Clustering is the task of acting on that hunch, and it asks a single question: which points belong together? A grouping is “good” when each point sits close to the centre of its own group, and the loop that finds one is simple enough to run by hand.

That loop is k-means, and the prompt’s last hint named its trick exactly: alternate between guessing the groups and refining them. Pick k centre points (centroids), then repeat two moves until nothing changes — every point joins the nearest centroid, and every centroid slides to the average of the points that just joined it. It is the workhorse behind customer segmentation, image colour-quantisation, and quick exploratory grouping whenever you have no labels to learn from.

The two-step loop

Think of the centroids as magnets and the points as iron filings. Each round, the filings snap to the closest magnet (assign), then each magnet recentres itself on its own cluster of filings (update). Repeat until the magnets stop moving.

One iteration = one assign step then one update step. The red × marks the recomputed centroid (the mean of its assigned points).

The loop is not aimless: it is greedily minimising the within-cluster sum of squares (WCSS) — the total squared distance from each point to its own centroid. That number is the precise meaning of “a good grouping.”

The assign step lowers WCSS by re-homing points; the update step lowers it because the mean is the point that minimises squared distance.

Each step can only decrease (or hold) WCSS, so the loop always halts. But it halts in whatever valley the starting centroids happened to roll into — a local optimum, not necessarily the best one.

How GATE asks this

The signature question is a NAT: you are given a set of points and told (or you must work out) which points were assigned to a centroid, then asked for the updated centroid after one iteration. The recipe is fixed — collect the assigned points, average their coordinates. (GATE DA 2024 ran a related conceptual MCQ: given two points in a cluster, which other point must also be in it — the same nearest-centroid reasoning.)

Worked example — one centroid-update iteration

Among a set of 2-D points, the ones nearest to centroid C3 = (6, 6) are (6, 6) and (9, 9). After one update step, where does C3 move?

The update rule says the new centroid is the mean of its assigned points. Average the coordinates separately:

assigned to C3:  (6, 6)  and  (9, 9)

new x = (6 + 9) / 2 = 15 / 2 = 7.5
new y = (6 + 9) / 2 = 15 / 2 = 7.5

C3  →  (7.5, 7.5)

So the updated centroid is (7.5, 7.5) — the midpoint of the two assigned points, exactly as the prediction prompt suggested. The whole task is “assign, then average,” nothing more.

For data with outliers, reach for k-medoid instead. It works the same way, but a cluster’s centre must be an actual data point — the medoid, the point with the smallest total distance to the rest — not an abstract mean. Because no averaging is involved, a single far-flung outlier cannot drag the centre away, which makes k-medoid more robust than k-means.

In one breath

k-means clusters unlabelled points by alternating two moves until nothing changes — assign each point to its nearest centroid, then update each centroid to the mean of its assigned points — and this greedily lowers the within-cluster sum of squares (the total squared point-to-centroid distance) every step, so it always halts, but only at a local optimum that depends on the random initialisation (hence multiple restarts); k-medoid swaps the mean for a real data point as the centre, trading a little cost for robustness to outliers.

Practice

Quick check

0/6

Q1Recall — Which statements about standard k-means are TRUE? (select all that apply)select all that apply

Q2Recall — Why is it common practice to run k-means multiple times with different initial centroids?

Q3Recall — How does k-medoid differ from k-means? (select all that apply)select all that apply

Q4Trace — A cluster contains the points (2, 4), (4, 4), and (6, 10). What is the x-coordinate of the updated k-means centroid?numerical answer — type a number

Q5Trace — Centroid C3 = (6, 6) is assigned the points (6, 6) and (9, 9). After one update iteration, what is the updated centroid's coordinate value (both x and y are equal)?numerical answer — type a number

Q6Apply — A cluster is assigned the points (0, 0), (0, 2), (4, 0), (4, 2). What is the x-coordinate of the updated centroid?numerical answer — type a number

A question to carry forward

k-means works, but look at what it asked of you up front: a number, k, the count of clusters — guessed before you have seen a single grouping. Guess wrong and the whole result is wrong. And because it rolls into whatever valley its random start finds, two runs can hand back two different answers.

Imagine instead a method that demands no k at all, and never gambles on a random start. It begins with every point alone, fuses the two closest, then the next two closest, and keeps going — recording the entire history as a tree you can slice at any height to read off however many clusters you want. Here is the thread onward: how does this merge-from-the-bottom approach work, what does it even mean to measure the distance between two clusters (not two points), and why does the rule you pick for that — nearest pair versus farthest pair — completely change the shape of the clusters you get?

k-means & k-medoid Clustering

What you'll learn

Before you start

The two-step loop

How GATE asks this

Worked example — one centroid-update iteration

In one breath

Practice

Quick check

A question to carry forward

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further