How do you choose k in k-means, and when does k-means fail?

Choose k using the elbow method on within-cluster sum of squares, the silhouette score, or domain knowledge, validating stability across runs. K-means fails when clusters are non-spherical, have very different sizes or densities, when outliers are present, or when k is genuinely unknown. It also only finds a local optimum, so initialization (k-means++) matters.

What's the difference between k-means and k-nearest neighbors? People confuse them.

K-means is an unsupervised clustering algorithm that partitions unlabeled data into k groups by iteratively updating centroids. KNN is a supervised algorithm that classifies or predicts a new point using the labels of its k closest training points. They share the letter k and the use of distances but solve completely different problems.

How does k-means clustering work?

K-means partitions n points into k clusters by alternating between two steps: assigning each point to its nearest centroid, then recomputing each centroid as the mean of its assigned points. It repeats until assignments stop changing, which guarantees convergence but not a globally optimal solution.

How do you choose the number of clusters k in k-means?

The elbow method plots inertia against k and looks for the bend where adding another cluster gives diminishing returns. The silhouette score measures how similar each point is to its own cluster versus its nearest rival, with values closer to 1 indicating tighter, better-separated clusters. Both should be used together, not in isolation.

K-means clustering — Machine Learning

Everything so far has been supervised — you had labels. But often you just have data and a question: are there natural groups in here? Customer segments, document topics, image colors. That’s unsupervised learning, and k-means is where it starts. It’s beautifully simple — two steps repeated — and it’s a near-guaranteed interview topic.

TryK-means · assign & update

Two steps, repeated until it settles

Pick k, then step the algorithm: assigneach point to its nearest centroid, then update each centroid to the mean of its points. Watch inertia fall — and the elbow plot bend at the true number of clusters (3).

inertia vs k (elbow)

k 3iter 0 · inertia 14150

K-means alternates assign (each point to nearest centroid) and update (centroid → mean of its points) until nothing moves — minimizing inertia. It can't tell you k, so you read the elbow (where adding clusters stops helping) or a silhouette score. And a bad random start can get stuck — real implementations use k-means++ init and several restarts.

Two steps, repeated

You tell k-means how many clusters to find (k). It then loops Lloyd’s algorithm:

Assign — put each point in the cluster of its nearest centroid.
Update — move each centroid to the mean of the points assigned to it.

Repeat until nothing moves. Each round lowers inertia — the total squared distance from points to their assigned centroid — until it settles into a local minimum. Here is where it converges — round clusters split by straight boundaries:

That’s the whole algorithm. No labels, no gradient descent — just alternating assignment and averaging.

The two things that bite

Initialization matters. A bad random start can leave centroids stuck in a poor configuration. The fix is k-means++, which spreads the initial centroids out, plus running several restarts and keeping the best. scikit-learn does both by default (init="k-means++", n_init="auto").
You have to choose k. k-means can’t discover the number of clusters. Two standard methods:
- Elbow — plot inertia vs k. It always decreases, but bends sharply at the “right” k — the curve flattens noticeably once you pass k=3.
- Silhouette score — measures how well-separated the clusters are (−1 to 1); pick the k that maximizes it. More reliable than the elbow when it’s ambiguous.

import numpy as np
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from sklearn.datasets import make_blobs

X, _ = make_blobs(n_samples=600, centers=3, cluster_std=1.1, random_state=0)

print(f"{'k':>2} {'inertia':>10} {'silhouette':>11}")
for k in range(2, 7):
    km = KMeans(n_clusters=k, n_init="auto", random_state=0).fit(X)
    sil = silhouette_score(X, km.labels_)
    print(f"{k:2d} {km.inertia_:10.0f} {sil:11.3f}")

print("\nInertia keeps dropping, but silhouette PEAKS at the true k=3.")

In one breath

K-means is unsupervised: no labels, just “are there natural groups in here?”
It loops two steps — assign each point to its nearest centroid, then move each centroid to the mean of its points — until nothing moves, lowering inertia each round.
Initialization matters; k-means++ with several restarts (scikit-learn’s default) avoids bad local minima.
It can’t discover k — you choose it with the elbow (where inertia bends) or, more reliably, the silhouette score.
It assumes round, similar-sized blobs and draws straight boundaries; for crescents, rings, or varying density, reach for DBSCAN.

Quick check

0/3

Q1What are the two repeated steps of k-means (Lloyd's algorithm)?

Q2How do you choose the number of clusters k?

Q3Your data forms two interleaved crescent shapes. Will k-means cluster them correctly?

K-means fails on non-blob shapes — DBSCAN & hierarchical clustering handle those. And to see high-dimensional clusters at all, you first reduce dimensions with PCA.

K-means clustering

What you'll learn

Before you start

Two steps, repeated until it settles

Two steps, repeated

The two things that bite

In one breath

Quick check

Quick check

Next

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further