How do you choose the number of clusters k in k-means?
The elbow method plots inertia against k and looks for the bend where adding another cluster gives diminishing returns. The silhouette score measures how similar each point is to its own cluster versus its nearest rival, with values closer to 1 indicating tighter, better-separated clusters. Both should be used together, not in isolation.
How to think about it
Choosing k is a model-selection problem with no ground-truth label, so you triangulate with at least two signals rather than trusting a single metric.
Elbow method
Run k-means for k = 2, 3, …, 10 and record inertia (within-cluster SSE). Plot inertia vs k. Where the curve bends — gains in tightness flatten out — is the “elbow.” It’s heuristic: real data often produces a smooth curve with no sharp bend, forcing you to pick a range rather than a single point.
Silhouette score
For each point i, compute:
- a(i): mean distance to other points in the same cluster (cohesion).
- b(i): mean distance to points in the nearest other cluster (separation).
Silhouette for point i = (b(i) - a(i)) / max(a(i), b(i)).
The mean over all points gives the overall silhouette score. Values range from -1 (wrong cluster) to +1 (tight, well-separated). Pick k where this is maximised.
Other signals
| Method | When to use |
|---|---|
| Gap statistic | Compares inertia to a null reference distribution; more principled but slow |
| Domain knowledge | Often the strongest signal — “we have 4 product tiers” |
| Downstream metric | If clusters feed a model, optimise that model’s performance directly |
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import numpy as np
inertias, sil_scores = [], []
ks = range(2, 11)
for k in ks:
km = KMeans(n_clusters=k, n_init=10, random_state=42).fit(X)
inertias.append(km.inertia_)
sil_scores.append(silhouette_score(X, km.labels_))
best_k = ks[np.argmax(sil_scores)]