Hierarchical Clustering & Linkage
Build clusters by merging: start with every point alone and repeatedly join the two closest. The linkage rule decides what closest means — single is min, complete is max.
What you'll learn
- Agglomerative clustering is bottom-up (merge); divisive is top-down (split)
- Single linkage = minimum pairwise distance; complete linkage = maximum pairwise distance
- The merge history is drawn as a dendrogram
- Finding the first merge: compute pairwise distances and join the closest pair
Before you start
k-means needs you to pick k up front and gives a flat set of clusters.
Hierarchical clustering does neither — it builds a whole tree of nested
clusters, so you can cut it at any level afterward. The usual flavour is
agglomerative: start with every point as its own tiny cluster, then repeatedly
fuse the two closest clusters until only one remains. That nested tree is why
biologists use it for gene-expression and phylogeny heatmaps, and why it is the
default when you do not know how many clusters you want in advance.
Bottom-up vs top-down
There are two directions:
- Agglomerative (bottom-up) — begin with
nsingleton clusters, repeatedly merge the two nearest. This is the common one. - Divisive (top-down) — begin with one giant cluster holding everything, repeatedly split it.
The full merge history is recorded as a dendrogram — a tree whose join heights show how far apart the clusters were when they fused.
Linkage — what does “distance between two clusters” mean?
Two points have one obvious distance. But two clusters (each a bag of points) do not — so we need a rule, called the linkage:
- Single linkage = the MINIMUM distance between any one point in cluster A and any one point in cluster B (the two nearest neighbours across the gap).
- Complete linkage = the MAXIMUM such distance (the two farthest points).
Because single linkage only needs one close pair to merge, it can chain points into long straggly clusters; complete linkage demands the whole far end be close too, so it builds compact clusters.
How GATE asks this
Two recurring shapes. First, a NAT: given a few points, compute the pairwise distances and report which pair merges first (the smallest distance) or that smallest distance itself. Second, an MSQ testing the linkage definitions — GATE DA 2025 asked you to identify that single linkage uses the minimum and complete linkage the maximum pairwise distance. GATE DA 2026 asked the first-merge question.
Worked example — a real 2026 question (Manhattan distance)
Points
P1 = (2, 3, −1),P2 = (3, 1, 1),P3 = (5, −2, 3),P4 = (3, 3, 3). Using Manhattan (L1) distance, which pair merges first in agglomerative clustering?
With every point its own cluster, the first merge is just the closest pair of points. Manhattan distance sums the absolute coordinate differences. Compute the candidates:
d(P2, P4) = |3−3| + |1−3| + |1−3| = 0 + 2 + 2 = 4
d(P1, P2) = |2−3| + |3−1| + |−1−1| = 1 + 2 + 2 = 5
d(P1, P4) = |2−3| + |3−3| + |−1−3| = 1 + 0 + 4 = 5
d(P3, P4) = |5−3| + |−2−3| + |3−3| = 2 + 5 + 0 = 7
The smallest distance is d(P2, P4) = 4, so P2 and P4 merge first. This is a
real GATE DA 2026 question. And recall the GATE DA 2025 fact: single linkage = min,
complete linkage = max.
Quick check
Quick check
Practice this in an interview
All questionsHierarchical clustering builds a tree of nested merges or splits and does not require specifying k upfront, but it is O(n² log n) and cannot revise early decisions. DBSCAN finds arbitrarily shaped clusters by density reachability, naturally marks outliers as noise, and also needs no k — but its results are sensitive to the eps and min_samples hyperparameters.
K-means partitions n points into k clusters by alternating between two steps: assigning each point to its nearest centroid, then recomputing each centroid as the mean of its assigned points. It repeats until assignments stop changing, which guarantees convergence but not a globally optimal solution.
K-means requires specifying k upfront, assumes clusters are convex and roughly equal in size and density, is sensitive to outliers and feature scale, and can converge to local minima. It struggles with non-globular shapes such as rings or crescents, and it assigns every point to exactly one cluster with no notion of uncertainty.
In high-dimensional spaces all pairwise distances concentrate around the same value, so the concept of a 'nearest' neighbour breaks down — the k-th nearest neighbour is almost as far as every other point. KNN's accuracy degrades sharply as dimensionality increases unless the data has much lower intrinsic dimensionality.