How do hierarchical clustering and DBSCAN differ from k-means?
Hierarchical clustering builds a tree of nested merges or splits and does not require specifying k upfront, but it is O(n² log n) and cannot revise early decisions. DBSCAN finds arbitrarily shaped clusters by density reachability, naturally marks outliers as noise, and also needs no k — but its results are sensitive to the eps and min_samples hyperparameters.
How to think about it
K-means, hierarchical, and DBSCAN are three distinct philosophies. Pick the right one based on data shape, scale, and whether outlier detection matters.
Hierarchical clustering
Agglomerative (bottom-up): start with each point as its own cluster and greedily merge the pair that minimises a linkage criterion — complete (max distance), average, or Ward (minimises inertia increase). The result is a dendrogram that encodes every possible k simultaneously; you choose k by cutting the tree at a height.
Strengths: no k needed upfront; the dendrogram reveals nested structure; works with any distance metric.
Weaknesses: O(n² log n) time and O(n²) memory — painful above ~50 k points; early merges are irreversible, so one bad merge propagates.
DBSCAN
DBSCAN labels points as:
- Core: has at least
min_samplesneighbours within radiuseps. - Border: within
epsof a core point but not itself core. - Noise: neither — these become the -1 label.
Clusters grow by density reachability: core points that are within eps of each other belong to the same cluster. Clusters can be any shape.
Strengths: discovers arbitrary shapes; built-in outlier detection; no k needed.
Weaknesses: two hyperparameters (eps, min_samples) that are hard to set; fails when clusters have varying density (HDBSCAN fixes this); performance degrades in high dimensions because distance concentration makes eps meaningless.
Quick comparison
| Property | k-means | Hierarchical | DBSCAN |
|---|---|---|---|
| Needs k | Yes | No (cut later) | No |
| Cluster shape | Convex | Any (with right linkage) | Any |
| Outliers | Forced into cluster | Forced into cluster | Labeled as noise |
| Scale | O(nk) | O(n² log n) | O(n log n) with index |
from sklearn.cluster import AgglomerativeClustering, DBSCAN
# Hierarchical — choose k=3 after inspecting dendrogram
hc = AgglomerativeClustering(n_clusters=3, linkage="ward")
labels_hc = hc.fit_predict(X)
# DBSCAN — tune eps with a k-distance plot
db = DBSCAN(eps=0.5, min_samples=5)
labels_db = db.fit_predict(X) # -1 = noise