datarekha

k-means & k-medoid Clustering

Clustering by alternation: assign every point to its nearest centroid, then move each centroid to the mean of its points. Repeat. A recurring GATE DA NAT.

8 min read Intermediate GATE DA Lesson 94 of 122

What you'll learn

  • k-means alternates two steps: assign points to the nearest centroid, then update each centroid to the mean of its points
  • It minimises the within-cluster sum of squared distances (WCSS)
  • It converges only to a local optimum, so the result depends on initialisation
  • k-medoid uses an actual data point as the centre, making it more robust to outliers

Before you start

Unsupervised learning has no labels — only points in space. Clustering asks a single question: which points belong together? k-means answers it with a loop so simple you can run it by hand. Pick k centre points (centroids), then repeat two moves until nothing changes: every point joins the nearest centroid, and every centroid slides to the average of the points that just joined it. It is the workhorse behind customer segmentation, image colour-quantisation, and quick exploratory grouping when you have no labels to learn from.

The two-step loop

Think of the centroids as magnets and the points as filings. Each round, the filings snap to the closest magnet (assign), then each magnet recentres itself on its own cluster of filings (update). Repeat until the magnets stop moving.

Step 1 — assign to nearest centroideach point picks its closest centreStep 2 — update centroid = meaneach centre (×) jumps to its cluster average
One iteration = one assign step then one update step. The red × marks the recomputed centroid (the mean of its assigned points).

The loop is not aimless: it is greedily minimising the within-cluster sum of squares (WCSS) — the total squared distance from each point to its own centroid.

WCSS = ∑ₖ ∑ₓ∈ₖ ‖x − μₖ‖²sum, over clusters k, of squared distances from each point x to its centroid μ
The assign step lowers WCSS by re-homing points; the update step lowers it because the mean is the point that minimises squared distance.

Each step can only decrease (or hold) WCSS, so the loop always halts. But it halts at whatever valley the starting centroids happened to roll into — a local optimum.

How GATE asks this

The signature question is a NAT: you are given a set of points and told (or you must work out) which points were assigned to a centroid, then asked for the updated centroid after one iteration. The recipe is fixed — collect the assigned points, average their coordinates. (GATE DA 2024 ran a related conceptual MCQ: given two points in a cluster, which other point must also be in it — same nearest-centroid reasoning.)

Worked example — one centroid-update iteration

Among a set of 2-D points, the ones nearest to centroid C3 = (6, 6) are (6, 6) and (9, 9). After one update step, where does C3 move?

The update rule says: the new centroid is the mean of its assigned points. Average the coordinates separately.

assigned to C3:  (6, 6)  and  (9, 9)

new x = (6 + 9) / 2 = 15 / 2 = 7.5
new y = (6 + 9) / 2 = 15 / 2 = 7.5

C3  →  (7.5, 7.5)

So the updated centroid is (7.5, 7.5) — note that the whole task is just “assign, then average,” nothing more.

For data with outliers, reach for k-medoid instead. It works the same way, but a cluster’s centre must be an actual data point (the medoid — the point with the smallest total distance to the rest), not an abstract mean. Because no averaging is involved, a single far-flung outlier cannot drag the centre away, making k-medoid more robust than k-means.

Quick check

Quick check

0/6
Q1A cluster contains the points (2, 4), (4, 4), and (6, 10). What is the x-coordinate of the updated k-means centroid? (the centroid is the mean of the assigned points)numerical answer — type a number
Q2Centroid C3 = (6, 6) is assigned the points (6, 6) and (9, 9). After one update iteration, what is the updated centroid's coordinate value (both x and y are equal)?numerical answer — type a number
Q3A cluster is assigned the points (0, 0), (0, 2), (4, 0), (4, 2). What is the updated centroid? (give the x-coordinate)numerical answer — type a number
Q4Which statements about standard k-means are TRUE? (select all that apply)select all that apply
Q5How does k-medoid differ from k-means? (select all that apply)select all that apply
Q6Why is it common practice to run k-means multiple times with different initial centroids?

Practice this in an interview

All questions
How does k-means clustering work?

K-means partitions n points into k clusters by alternating between two steps: assigning each point to its nearest centroid, then recomputing each centroid as the mean of its assigned points. It repeats until assignments stop changing, which guarantees convergence but not a globally optimal solution.

What is k-means++ and why is it better than random initialisation?

K-means++ initialises centroids by probabilistically spacing them apart: the first centroid is chosen uniformly at random, and each subsequent centroid is chosen with probability proportional to its squared distance from the nearest already-chosen centroid. This reduces the chance of bad starts, cuts the number of iterations to convergence, and provides an O(log k) approximation guarantee on the final inertia.

What are the main limitations of k-means clustering?

K-means requires specifying k upfront, assumes clusters are convex and roughly equal in size and density, is sensitive to outliers and feature scale, and can converge to local minima. It struggles with non-globular shapes such as rings or crescents, and it assigns every point to exactly one cluster with no notion of uncertainty.

How do you choose the number of clusters k in k-means?

The elbow method plots inertia against k and looks for the bend where adding another cluster gives diminishing returns. The silhouette score measures how similar each point is to its own cluster versus its nearest rival, with values closer to 1 indicating tighter, better-separated clusters. Both should be used together, not in isolation.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content