How does the Isolation Forest algorithm detect anomalies?
Isolation Forest builds many random trees by repeatedly picking a random feature and a random split value, partitioning the data until points are isolated. Anomalies get isolated in far fewer splits because they're rare and different, so their average path length across trees is short. The shorter the expected path length, the higher the anomaly score, making it fast and effective in high dimensions.
How to think about it
The crisp answer
Isolation Forest detects anomalies by how easy they are to isolate. It builds an ensemble of random binary trees; at each node it picks a random feature and a random split value. Anomalies, being few and different, get separated from the rest in very few splits, so their average path length to a leaf is short. Short path length → high anomaly score.
Why the intuition works
The insight (Liu & Zhou, 2008), described in the Analytics Vidhya Isolation Forest guide, flips the usual approach: instead of profiling normal points, it directly isolates outliers. A normal point sits in a dense region and needs many random cuts to wall off; an outlier in sparse space gets cut off almost immediately.
The scoring in words
For each point, average its path length across all trees, normalize by the expected path length for that dataset size, and convert to a score in (0, 1). Scores near 1 are anomalies; scores well below 0.5 are normal. You set a contamination threshold to label points.
Why it’s popular
- Fast and scalable: linear-ish time, low memory, parallelizable; subsamples the data per tree.
- No distance/density computation, so it handles high dimensions better than LOF or distance methods.
- Few assumptions about the data distribution.
The common trap
Forgetting it can struggle with local anomalies in clustered data and that the contamination parameter (assumed anomaly fraction) strongly affects results — set it from domain knowledge, not blindly. Axis-aligned random splits can also miss anomalies along diagonal directions (Extended Isolation Forest helps). Follow-up: “vs one-class SVM?” — Isolation Forest is usually faster and scales better to large, high-dimensional data; one-class SVM can capture more complex boundaries but is costlier to tune.