datarekha
Machine Learning Easy Asked at GoogleAsked at AmazonAsked at MetaAsked at Microsoft

How does k-means clustering work?

The short answer

K-means partitions n points into k clusters by alternating between two steps: assigning each point to its nearest centroid, then recomputing each centroid as the mean of its assigned points. It repeats until assignments stop changing, which guarantees convergence but not a globally optimal solution.

How to think about it

K-means is an iterative coordinate-descent algorithm that minimises within-cluster sum of squared distances (inertia). Nail the two-step loop, and then explain why the result is only a local minimum.

The algorithm

  1. Initialize k centroids — randomly, or with k-means++ (spread initial centroids to cut bad starts).
  2. Assign every point to the closest centroid using Euclidean distance.
  3. Update each centroid to the mean of all points assigned to it.
  4. Repeat steps 2–3 until assignments do not change (or change falls below a tolerance).

Convergence is guaranteed because inertia can only decrease or stay flat at each step, and the number of possible partitions is finite. But it may converge to a local minimum, so scikit-learn runs it n_init times (default 10) and keeps the run with the lowest inertia.

Assign pointsnearest centroidUpdate centroidsmean of clusterrepeat until stableConverged
Assign then update, until cluster membership stabilises.

Quick Python

from sklearn.cluster import KMeans

km = KMeans(n_clusters=3, init="k-means++", n_init=10, random_state=42)
labels = km.fit_predict(X)          # integer cluster label per point
centroids = km.cluster_centers_     # shape (k, n_features)
inertia = km.inertia_               # within-cluster SSE
Learn it properly The scikit-learn API

Keep practising

All Machine Learning questions

Explore further

Skip to content