When would you use DBSCAN instead of k-means, and what are its main limitations?

For Data Scientist ML Engineer Data Analyst

The short answer

Use DBSCAN when clusters have arbitrary, non-spherical shapes, when the number of clusters is unknown, and when you need to detect outliers, since it groups by density and labels low-density points as noise. Its main limitations are sensitivity to the eps and minPts parameters and difficulty when clusters have very different densities. It also struggles in high dimensions where distance becomes unreliable.

How to think about it

The crisp answer

Reach for DBSCAN when k-means’s assumptions fail: you have non-spherical clusters, you don’t know the number of clusters, and you expect outliers. DBSCAN groups points by density and explicitly labels sparse points as noise, so it finds arbitrary shapes and ignores outliers instead of forcing them into clusters.

Why it differs from k-means

K-means partitions every point into one of k round clusters around centroids. DBSCAN instead defines clusters as dense regions connected through core points (points with at least minPts neighbors within radius eps). As the Hex comparison of density-based methods explains, this lets it recover crescents, rings, and other shapes k-means cuts through, while reporting noise points separately.

When to use it

Geospatial clustering, anomaly detection, customer segmentation where cluster count is unknown.
Data with noise you want flagged rather than absorbed.
Clusters of irregular shape and roughly similar density.

The main limitations

Parameter sensitivity: eps and minPts are hard to set; a k-distance plot helps pick eps.
Varying densities: a single global eps can’t capture clusters that are dense in one region and sparse in another (HDBSCAN addresses this).
High dimensions: distances concentrate, so density becomes unreliable — the curse of dimensionality.

The common trap

Forgetting that DBSCAN has no centroids and can’t assign new points without re-running, and that it struggles with multi-density data. Always scale features first. Follow-up: “What if densities vary a lot?” — use HDBSCAN, which builds a hierarchy and extracts clusters at varying density levels, or hierarchical clustering when you want a dendrogram and no fixed k.

Learn it properly DBSCAN & hierarchical

When would you use DBSCAN instead of k-means, and what are its main limitations?

The crisp answer

Why it differs from k-means

When to use it

The main limitations

The common trap

Keep practising

Explore further