Eigenvectors: the directions a matrix doesn't turn
Most vectors get rotated when a matrix acts on them — eigenvectors are the stubborn exceptions, and that stubbornness is exactly why they run machine learning.
Apply a 2×2 matrix to a random vector and watch the arrow rotate. Apply it a thousand times and the arrow almost always collapses onto the same line — the one direction the matrix is physically incapable of turning. That line is the eigenvector. The factor by which it grows or shrinks is the eigenvalue.
Mathematicians write this as Av = λv: the matrix A times a vector v equals a scalar λ times the same v. Swap the Greek for English and the equation says: this direction transforms into itself, just rescaled. Everything that feels abstract about eigenvectors follows from that one sentence.
The transformation picture
Think of a matrix not as a table of numbers but as a recipe for deforming space. You pin the origin down, then every other point moves. A rotation matrix spins the plane. A shear matrix slants it. A projection matrix collapses it onto a lower-dimensional surface.
When a vector lives in a generic direction, this deformation sends it somewhere new — different angle, different length. But in any non-pathological transformation there exist one or more directions that are structurally fixed. The deformation can lengthen or shorten them; it cannot rotate them off their line. Those are the eigendirections.
The eigenvalue λ tells you what the stretching factor is. λ = 3 means the eigenvector is tripled. λ = 0.5 means it is halved. λ = -1 means it is flipped, still on the same line. λ = 0 means it is collapsed to zero — the matrix has a null direction.
A shear transformation rotates most vectors off their original directions (shown as color shifts). The eigenvector e stays on its line — only stretched by the eigenvalue λ.
Why most transformations have these fixed directions
Here is a fact that surprises people: a real symmetric matrix (think the covariance matrix of a dataset) is guaranteed to have a full set of eigenvectors, and they are guaranteed to be perpendicular to each other. This is the spectral theorem, and it is the reason so much of statistics works at all.
A general square matrix might have complex eigenvalues — rotations with no real fixed direction. But symmetric matrices, the ones that arise naturally from data (covariance, correlation, Gram matrices), always decompose cleanly into real, orthogonal eigenvectors. The universe of data analysis is built on that guarantee.
PCA is just an eigendecomposition
Principal component analysis (PCA) — the workhorse of dimensionality reduction — reduces entirely to finding eigenvectors of the covariance matrix. That is not a simplification. That is the algorithm.
The covariance matrix C of your dataset encodes how every pair of features varies together. Its eigenvectors are the directions in feature space along which the data spreads the most. The corresponding eigenvalue tells you exactly how much variance lives along that direction.
Concretely: if you have 50 features and the first two eigenvectors of C together account for 87% of the total variance (sum of all eigenvalues), you can project onto those two directions and keep 87% of the information. The other 48 directions are, in a precise mathematical sense, mostly noise.
The reason PCA is so powerful is that real datasets are almost never uniformly random in 50 dimensions. Features correlate. That correlation structure shows up in the off-diagonal entries of C, and the eigenvectors of C are precisely the directions that diagonalize that structure — the axes along which the dataset’s “natural shape” is aligned.
The leading eigenvector is the axis of maximum variance. The second eigenvector is the axis of maximum variance subject to being perpendicular to the first. And so on. You are finding the skeleton of the data cloud, one direction at a time.
PCA finds the eigenvectors of the covariance matrix. PC1 (high eigenvalue) is the long axis of the data ellipse — maximum spread. PC2 (low eigenvalue) is perpendicular — the remaining spread.
The eigenvalue is a vote
Think about what happens when you multiply a matrix by itself repeatedly: A, then A², then A³. Each application stretches or compresses every eigenvector by its eigenvalue, so after n steps the eigenvector with eigenvalue λ has been scaled by λⁿ.
If λ = 1.1, after 100 steps that direction has grown by a factor of 1.1¹⁰⁰ — roughly 13,780. If λ = 0.9, it has shrunk to 0.9¹⁰⁰ — nearly zero. In the limit, only the eigenvectors with |λ| near 1 survive. Everything else either explodes or vanishes.
This is exactly the calculation behind Google’s original PageRank. The web’s link structure defines a transition matrix: entry (i, j) is the probability of following a link from page j to page i. Multiplying a probability vector by this matrix over and over — power iteration — converges to the matrix’s dominant eigenvector: the one with eigenvalue exactly 1. That converged vector is the PageRank score. Every page’s importance is the coordinate of the web’s dominant eigenvector.
The claim is bold but precise: PageRank is an eigenvector. The internet’s original ranking algorithm is a fixed point of a linear transformation, found by repeated multiplication.
Stability lives in the eigenvalues
Control engineers and economists use the same idea. A dynamical system — a robot arm, a macroeconomic model, a neural network’s gradient flow — can often be linearized to x(t+1) = Ax(t). The question “will this system blow up or settle down?” reduces entirely to the eigenvalues of A.
If every eigenvalue has absolute value strictly less than 1, every direction contracts and the system settles. If any eigenvalue has absolute value greater than 1, that direction grows without bound and the system is unstable. The boundary case — eigenvalue exactly 1 or on the complex unit circle — is the interesting regime where cycles and marginal stability live.
This is why practitioners who work with recurrent neural networks obsess over the spectral radius (the largest absolute eigenvalue) of their weight matrices. A spectral radius above 1 causes exploding gradients. Below 1 causes vanishing gradients. The usable regime is a narrow band near 1, and the entire training difficulty of deep RNNs is the challenge of staying in it.
What eigenvectors are not
Eigenvectors are not the “important features.” That conflation is seductive but wrong. PCA’s principal components are linear combinations of your original features. They are directions in the space spanned by those features, not the features themselves. The first principal component of a 50-feature dataset is a weighted sum of all 50 features, not a winner-takes-all selection.
Eigenvectors are also not unique. If v is an eigenvector, so is 2v or -v. The direction is what matters, not the specific vector. And when two eigenvectors share the same eigenvalue (a degenerate eigenspace), any linear combination of them is also an eigenvector. The eigenspace is a subspace, not a point.
This matters practically. When you run PCA twice on the same dataset, you may get different-signed principal components — the sign is arbitrary. When you rotate within a degenerate eigenspace, you are still inside the eigenspace. Software packages handle this quietly, but you need to know it so you are not confused when two implementations give you PC1 pointing in opposite directions.
The covariance matrix is not the data
A subtlety that trips up students: you never “see” the covariance matrix in its eigenvector basis until after the decomposition. What you have is a matrix of correlations. What you want is a coordinate system in which those correlations disappear — in which the data’s variance is maximally concentrated in the fewest axes. The eigenvectors are that coordinate system.
Projecting your data onto the top k eigenvectors does not lose arbitrary information. It loses specifically the information that lived in the lowest-variance directions. If those directions were noise, you have denoised your data. If they were signal, you have discarded it. The eigenvalue spectrum — the sorted list of eigenvalues — is the diagnostic. A sharp drop after the first few eigenvalues means the data is genuinely low-dimensional. A flat spectrum means no compression is safe.
Plot the eigenvalue spectrum before you choose k. That plot is more informative than the two-dimensional scatter of PC1 vs PC2 that textbooks show first.
The deeper point
Matrices look like tables of numbers. Eigenvectors reveal that they are actually transformations with a skeleton — a set of invariant directions around which all the rotation, stretching, and shearing is organized. The rest of the transformation is built out of those directions the way a chord is built from notes.
That geometric reality is why eigenvectors appear everywhere that linear transformations appear: data analysis, physics, signal processing, graph theory, optimization. You are never computing them for their own sake. You are asking: what are the natural axes of this transformation? And the transformation answers by showing you the directions it refuses to turn.
That stubbornness is the entire secret.