How do you use a heatmap to visualize correlations, and what are its limitations?
The short answer
A correlation heatmap encodes the pairwise Pearson or Spearman correlation coefficients of a numeric feature matrix as a color grid, making it fast to spot highly correlated feature pairs. Its limitations are that it shows only linear (or rank) association, hides nonlinear structure, and becomes unreadable past roughly 20 features.
How to think about it
How a correlation heatmap works
Each cell (i, j) contains the correlation coefficient between feature i and feature j, ranging from -1 to +1. The diagonal is always 1.0 (self-correlation). The matrix is symmetric, so the upper triangle is the mirror of the lower; displaying only the lower triangle (or upper) reduces redundancy.
Color encoding:
- Use a diverging palette (e.g., blue-white-red or RdBu) with white or neutral at zero, saturated blue at -1, and saturated red at +1.
- Do NOT use a sequential palette — it will make negative correlations look similar to near-zero ones.
Reading the heatmap
- Bright red clusters: highly co-linear features — likely redundant for a linear model; one may be droppable.
- Bright blue clusters: strong inverse relationships — could indicate features that naturally offset each other.
- A feature row that is neutral across all others: likely low mutual information, weak predictor.
Practical workflow
import seaborn as sns
import matplotlib.pyplot as plt
corr = df.select_dtypes("number").corr()
mask = np.triu(np.ones_like(corr, dtype=bool)) # hide upper triangle
sns.heatmap(corr, mask=mask, cmap="RdBu_r", vmin=-1, vmax=1,
annot=True, fmt=".2f", linewidths=0.5)
plt.title("Pairwise Pearson Correlation")
Limitations
- Linear only: Pearson correlation is zero for two variables related by y = x² centered at zero, even though the relationship is deterministic. A scatter matrix or mutual information scores complement the heatmap.
- Scale: with 30+ features, cell text overlaps and colors blend. Cluster the features with hierarchical clustering (seaborn’s
clustermap) to group related variables and make structure visible. - Outlier sensitivity: Pearson is sensitive to outliers. Spearman rank correlation is more robust and is a better default for skewed financial or behavioral data.
- Causation: high correlation between two features says nothing about which causes which, or whether both are driven by a confound.