Machine Learning Easy Asked at GoogleAsked at AmazonAsked at MetaAsked at Microsoft

How does k-means clustering work?

For Data Scientist ML Engineer AI / LLM Engineer

The short answer

K-means partitions n points into k clusters by alternating between two steps: assigning each point to its nearest centroid, then recomputing each centroid as the mean of its assigned points. It repeats until assignments stop changing, which guarantees convergence but not a globally optimal solution.

How to think about it

K-means is an iterative coordinate-descent algorithm that minimises within-cluster sum of squared distances (inertia). Nail the two-step loop, and then explain why the result is only a local minimum.

The algorithm

Initialize k centroids — randomly, or with k-means++ (spread initial centroids to cut bad starts).
Assign every point to the closest centroid using Euclidean distance.
Update each centroid to the mean of all points assigned to it.
Repeat steps 2–3 until assignments do not change (or change falls below a tolerance).

Convergence is guaranteed because inertia can only decrease or stay flat at each step, and the number of possible partitions is finite. But it may converge to a local minimum, so scikit-learn runs it n_init times (default 10) and keeps the run with the lowest inertia.

Assign then update, until cluster membership stabilises.

Quick Python

from sklearn.cluster import KMeans

km = KMeans(n_clusters=3, init="k-means++", n_init=10, random_state=42)
labels = km.fit_predict(X)          # integer cluster label per point
centroids = km.cluster_centers_     # shape (k, n_features)
inertia = km.inertia_               # within-cluster SSE

Learn it properly The scikit-learn API

How does k-means clustering work?

The algorithm

Quick Python

Keep practising

Explore further