datarekha

Hierarchical Clustering & Linkage

Build clusters by merging: start with every point alone and repeatedly join the two closest. The linkage rule decides what closest means — single is min, complete is max.

7 min read Intermediate GATE DA Lesson 95 of 122

What you'll learn

  • Agglomerative clustering is bottom-up (merge); divisive is top-down (split)
  • Single linkage = minimum pairwise distance; complete linkage = maximum pairwise distance
  • The merge history is drawn as a dendrogram
  • Finding the first merge: compute pairwise distances and join the closest pair

Before you start

k-means needs you to pick k up front and gives a flat set of clusters. Hierarchical clustering does neither — it builds a whole tree of nested clusters, so you can cut it at any level afterward. The usual flavour is agglomerative: start with every point as its own tiny cluster, then repeatedly fuse the two closest clusters until only one remains. That nested tree is why biologists use it for gene-expression and phylogeny heatmaps, and why it is the default when you do not know how many clusters you want in advance.

Bottom-up vs top-down

There are two directions:

  • Agglomerative (bottom-up) — begin with n singleton clusters, repeatedly merge the two nearest. This is the common one.
  • Divisive (top-down) — begin with one giant cluster holding everything, repeatedly split it.

The full merge history is recorded as a dendrogram — a tree whose join heights show how far apart the clusters were when they fused.

P1P2P4P3merge 1 (closest)0dist ↑
A dendrogram: the lowest join is the closest (first) merge; higher joins fuse clusters that are farther apart.

Linkage — what does “distance between two clusters” mean?

Two points have one obvious distance. But two clusters (each a bag of points) do not — so we need a rule, called the linkage:

  • Single linkage = the MINIMUM distance between any one point in cluster A and any one point in cluster B (the two nearest neighbours across the gap).
  • Complete linkage = the MAXIMUM such distance (the two farthest points).
Single linkage = MINclosest cross-cluster pairComplete linkage = MAXfarthest cross-cluster pair
Single linkage measures clusters by their nearest points (can produce long, chained clusters); complete linkage by their farthest points (produces compact, tight clusters).

Because single linkage only needs one close pair to merge, it can chain points into long straggly clusters; complete linkage demands the whole far end be close too, so it builds compact clusters.

How GATE asks this

Two recurring shapes. First, a NAT: given a few points, compute the pairwise distances and report which pair merges first (the smallest distance) or that smallest distance itself. Second, an MSQ testing the linkage definitions — GATE DA 2025 asked you to identify that single linkage uses the minimum and complete linkage the maximum pairwise distance. GATE DA 2026 asked the first-merge question.

Worked example — a real 2026 question (Manhattan distance)

Points P1 = (2, 3, −1), P2 = (3, 1, 1), P3 = (5, −2, 3), P4 = (3, 3, 3). Using Manhattan (L1) distance, which pair merges first in agglomerative clustering?

With every point its own cluster, the first merge is just the closest pair of points. Manhattan distance sums the absolute coordinate differences. Compute the candidates:

d(P2, P4) = |3−3| + |1−3| + |1−3| = 0 + 2 + 2 = 4
d(P1, P2) = |2−3| + |3−1| + |−1−1| = 1 + 2 + 2 = 5
d(P1, P4) = |2−3| + |3−3| + |−1−3| = 1 + 0 + 4 = 5
d(P3, P4) = |5−3| + |−2−3| + |3−3| = 2 + 5 + 0 = 7

The smallest distance is d(P2, P4) = 4, so P2 and P4 merge first. This is a real GATE DA 2026 question. And recall the GATE DA 2025 fact: single linkage = min, complete linkage = max.

Quick check

Quick check

0/6
Q1Points P2 = (3, 1, 1) and P4 = (3, 3, 3). What is their Manhattan (L1) distance? (the 2026 PYQ first-merge value)numerical answer — type a number
Q2Cluster A = {(0,0), (0,2)} and cluster B = {(5,0), (5,4)}, using Euclidean distance. What is the SINGLE-linkage distance between A and B?numerical answer — type a number
Q3For the same clusters A = {(0,0), (0,2)} and B = {(5,0), (5,4)}, what is the COMPLETE-linkage distance? (round to 2 decimals)numerical answer — type a number
Q4Which statements about linkage are correct? (select all that apply)select all that apply
Q5Which statements about hierarchical clustering are TRUE? (select all that apply)select all that apply
Q6In single-linkage agglomerative clustering on a set of singleton points, which pair is merged in the very first step?

Practice this in an interview

All questions
How do hierarchical clustering and DBSCAN differ from k-means?

Hierarchical clustering builds a tree of nested merges or splits and does not require specifying k upfront, but it is O(n² log n) and cannot revise early decisions. DBSCAN finds arbitrarily shaped clusters by density reachability, naturally marks outliers as noise, and also needs no k — but its results are sensitive to the eps and min_samples hyperparameters.

How does k-means clustering work?

K-means partitions n points into k clusters by alternating between two steps: assigning each point to its nearest centroid, then recomputing each centroid as the mean of its assigned points. It repeats until assignments stop changing, which guarantees convergence but not a globally optimal solution.

What are the main limitations of k-means clustering?

K-means requires specifying k upfront, assumes clusters are convex and roughly equal in size and density, is sensitive to outliers and feature scale, and can converge to local minima. It struggles with non-globular shapes such as rings or crescents, and it assigns every point to exactly one cluster with no notion of uncertainty.

How does the curse of dimensionality affect KNN?

In high-dimensional spaces all pairwise distances concentrate around the same value, so the concept of a 'nearest' neighbour breaks down — the k-th nearest neighbour is almost as far as every other point. KNN's accuracy degrades sharply as dimensionality increases unless the data has much lower intrinsic dimensionality.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content