How does k-means clustering work?
K-means partitions n points into k clusters by alternating between two steps: assigning each point to its nearest centroid, then recomputing each centroid as the mean of its assigned points. It repeats until assignments stop changing, which guarantees convergence but not a globally optimal solution.
How to think about it
K-means is an iterative coordinate-descent algorithm that minimises within-cluster sum of squared distances (inertia). Nail the two-step loop, and then explain why the result is only a local minimum.
The algorithm
- Initialize k centroids — randomly, or with k-means++ (spread initial centroids to cut bad starts).
- Assign every point to the closest centroid using Euclidean distance.
- Update each centroid to the mean of all points assigned to it.
- Repeat steps 2–3 until assignments do not change (or change falls below a tolerance).
Convergence is guaranteed because inertia can only decrease or stay flat at each step, and the number of possible partitions is finite. But it may converge to a local minimum, so scikit-learn runs it n_init times (default 10) and keeps the run with the lowest inertia.
Quick Python
from sklearn.cluster import KMeans
km = KMeans(n_clusters=3, init="k-means++", n_init=10, random_state=42)
labels = km.fit_predict(X) # integer cluster label per point
centroids = km.cluster_centers_ # shape (k, n_features)
inertia = km.inertia_ # within-cluster SSE