k-means & k-medoid Clustering
Clustering by alternation: assign every point to its nearest centroid, then move each centroid to the mean of its points. Repeat. A recurring GATE DA NAT.
What you'll learn
- k-means alternates two steps: assign points to the nearest centroid, then update each centroid to the mean of its points
- It minimises the within-cluster sum of squared distances (WCSS)
- It converges only to a local optimum, so the result depends on initialisation
- k-medoid uses an actual data point as the centre, making it more robust to outliers
Before you start
Unsupervised learning has no labels — only points in space. Clustering asks a
single question: which points belong together? k-means answers it with a loop so
simple you can run it by hand. Pick k centre points (centroids), then repeat two
moves until nothing changes: every point joins the nearest centroid, and every
centroid slides to the average of the points that just joined it. It is the
workhorse behind customer segmentation, image colour-quantisation, and quick
exploratory grouping when you have no labels to learn from.
The two-step loop
Think of the centroids as magnets and the points as filings. Each round, the filings snap to the closest magnet (assign), then each magnet recentres itself on its own cluster of filings (update). Repeat until the magnets stop moving.
The loop is not aimless: it is greedily minimising the within-cluster sum of squares (WCSS) — the total squared distance from each point to its own centroid.
Each step can only decrease (or hold) WCSS, so the loop always halts. But it halts at whatever valley the starting centroids happened to roll into — a local optimum.
How GATE asks this
The signature question is a NAT: you are given a set of points and told (or you must work out) which points were assigned to a centroid, then asked for the updated centroid after one iteration. The recipe is fixed — collect the assigned points, average their coordinates. (GATE DA 2024 ran a related conceptual MCQ: given two points in a cluster, which other point must also be in it — same nearest-centroid reasoning.)
Worked example — one centroid-update iteration
Among a set of 2-D points, the ones nearest to centroid
C3 = (6, 6)are(6, 6)and(9, 9). After one update step, where doesC3move?
The update rule says: the new centroid is the mean of its assigned points. Average the coordinates separately.
assigned to C3: (6, 6) and (9, 9)
new x = (6 + 9) / 2 = 15 / 2 = 7.5
new y = (6 + 9) / 2 = 15 / 2 = 7.5
C3 → (7.5, 7.5)
So the updated centroid is (7.5, 7.5) — note that the whole task is just “assign, then average,” nothing more.
For data with outliers, reach for k-medoid instead. It works the same way, but a cluster’s centre must be an actual data point (the medoid — the point with the smallest total distance to the rest), not an abstract mean. Because no averaging is involved, a single far-flung outlier cannot drag the centre away, making k-medoid more robust than k-means.
Quick check
Quick check
Practice this in an interview
All questionsK-means partitions n points into k clusters by alternating between two steps: assigning each point to its nearest centroid, then recomputing each centroid as the mean of its assigned points. It repeats until assignments stop changing, which guarantees convergence but not a globally optimal solution.
K-means++ initialises centroids by probabilistically spacing them apart: the first centroid is chosen uniformly at random, and each subsequent centroid is chosen with probability proportional to its squared distance from the nearest already-chosen centroid. This reduces the chance of bad starts, cuts the number of iterations to convergence, and provides an O(log k) approximation guarantee on the final inertia.
K-means requires specifying k upfront, assumes clusters are convex and roughly equal in size and density, is sensitive to outliers and feature scale, and can converge to local minima. It struggles with non-globular shapes such as rings or crescents, and it assigns every point to exactly one cluster with no notion of uncertainty.
The elbow method plots inertia against k and looks for the bend where adding another cluster gives diminishing returns. The silhouette score measures how similar each point is to its own cluster versus its nearest rival, with values closer to 1 indicating tighter, better-separated clusters. Both should be used together, not in isolation.