How does hierarchical clustering work, and how do you decide the number of clusters from a dendrogram?

Agglomerative hierarchical clustering starts with each point as its own cluster and repeatedly merges the two closest clusters using a linkage rule (single, complete, average, or Ward) until one cluster remains, producing a dendrogram. You choose the number of clusters by cutting the dendrogram at a height where merges jump sharply, which indicates joining dissimilar groups. Unlike k-means it needs no preset k but is computationally expensive.

When would you use DBSCAN instead of k-means, and what are its main limitations?

Use DBSCAN when clusters have arbitrary, non-spherical shapes, when the number of clusters is unknown, and when you need to detect outliers, since it groups by density and labels low-density points as noise. Its main limitations are sensitivity to the eps and minPts parameters and difficulty when clusters have very different densities. It also struggles in high dimensions where distance becomes unreliable.

How do hierarchical clustering and DBSCAN differ from k-means?

Hierarchical clustering builds a tree of nested merges or splits and does not require specifying k upfront, but it is O(n² log n) and cannot revise early decisions. DBSCAN finds arbitrarily shaped clusters by density reachability, naturally marks outliers as noise, and also needs no k — but its results are sensitive to the eps and min_samples hyperparameters.

What are the main limitations of k-means clustering?

K-means requires specifying k upfront, assumes clusters are convex and roughly equal in size and density, is sensitive to outliers and feature scale, and can converge to local minima. It struggles with non-globular shapes such as rings or crescents, and it assigns every point to exactly one cluster with no notion of uncertainty.

DBSCAN & hierarchical clustering — Machine Learning

k-means is fast and popular, but it has three rigid assumptions: clusters are round, similarly sized, and you know how many there are. Break any of those and it fails. DBSCAN and hierarchical clustering are the shape-flexible alternatives that handle the cases k-means can’t.

DBSCAN — clustering by density

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) has a beautifully different idea: a cluster is a dense region of points. It uses two parameters:

eps — the neighborhood radius.
minPts — how many neighbors a point needs (within eps) to be a core point.

The algorithm grows clusters by chaining core points together, and any point that’s neither a core point nor reachable from one is labeled noise. It traces two crescents that would break k-means, and flags the strays as noise:

That gives DBSCAN three powers k-means lacks:

Arbitrary shapes — it follows curves, rings, and blobs, because it grows by local density, not distance to a center.
No preset k — it discovers the number of clusters from the data.
Built-in outlier detection — sparse points become noise instead of being forced into a cluster.

Hierarchical clustering — a tree of clusters

The other shape-flexible approach builds a hierarchy. Agglomerative clustering starts with every point as its own cluster and repeatedly merges the two closest clusters until one remains, recording the whole sequence as a dendrogram — a tree. You then “cut” the tree at a height to get however many clusters you want.

The advantage: you don’t commit to k up front — you see the structure at every scale and pick the cut afterward. The cost: it’s slower (roughly O(n²) or worse), so it’s best for smaller datasets, and the linkage choice (how you measure distance between clusters — single, complete, average, Ward) changes the result.

import numpy as np
from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering
from sklearn.datasets import make_moons
from sklearn.metrics import adjusted_rand_score

X, y = make_moons(n_samples=400, noise=0.06, random_state=0)

for name, model in [
    ("k-means",       KMeans(n_clusters=2, n_init="auto", random_state=0)),
    ("agglomerative", AgglomerativeClustering(n_clusters=2, linkage="single")),
    ("DBSCAN",        DBSCAN(eps=0.18, min_samples=5)),
]:
    labels = model.fit_predict(X)
    ari = adjusted_rand_score(y, labels)   # 1.0 = perfect recovery of the moons
    print(f"{name:14} ARI = {ari:.3f}")

print("\nk-means slices the moons (~0.2); density/single-linkage recover them (~1.0).")

In one breath

When clusters aren’t round blobs, density-based and hierarchical methods succeed where k-means fails.
DBSCAN grows clusters from dense regions (core points have ≥ minPts neighbors within eps), so it traces arbitrary shapes, discovers k itself, and flags sparse points as noise.
Its weakness is varying density — one global eps can’t fit both a dense and a sparse cluster (HDBSCAN helps).
Hierarchical clustering merges the closest clusters into a dendrogram you cut at any height — pick k after seeing structure at every scale, at O(n²) cost.
Rule of thumb: blobs + big data + known k → k-means; arbitrary shapes + outliers → DBSCAN; want the full hierarchy or small data → agglomerative. Always scale features first.

Quick check

0/3

Q1What makes a point a 'core point' in DBSCAN?

Q2Your data has two interleaved crescents. Why does DBSCAN succeed where k-means fails?

Q3What is a dendrogram in hierarchical clustering?

To see high-dimensional clusters before you cluster them, project with t-SNE & UMAP. And to find the points that aren’t in any cluster, see anomaly detection.

DBSCAN & hierarchical clustering

What you'll learn

Before you start

DBSCAN — clustering by density

Hierarchical clustering — a tree of clusters

In one breath

Quick check

Quick check

Next

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further