Singular Value Decomposition
Every matrix factors as A = UΣVᵀ with orthogonal U, V and non-negative singular values on Σ; the singular values are the square roots of the eigenvalues of AᵀA.
What you'll learn
- SVD: A = UΣVᵀ with U, V orthogonal and Σ diagonal of non-negative singular values
- Singular values σᵢ = √(eigenvalues of AᵀA), always ≥ 0
- For a symmetric positive-semidefinite matrix the singular values equal the eigenvalues
- Number of nonzero singular values = rank(A); a rank-1 uuᵀ has one nonzero singular value = ‖u‖²
Before you start
Eigenvalues are a bit picky. They only describe square matrices, and even then they can come out negative or complex. The singular value decomposition doesn’t mind any of that. It works for any matrix — square or not — and produces a clean set of non-negative numbers, the singular values, that measure how much the matrix stretches space along its most important directions.
The factorisation reads A = UΣVᵀ. Geometrically it’s three simple moves: an
orthogonal V picks the input directions, the diagonal Σ stretches by a
singular value along each, and an orthogonal U rotates the result into the
output space. Rotate, stretch, rotate — that’s every linear map.
A = UΣVᵀ
The central question on GATE is: where do the singular values come from? Form the
symmetric matrix AᵀA. It is always symmetric positive-semidefinite, so its eigenvalues
are real and ≥ 0. The singular values are their square roots:
σᵢ = √( λᵢ(AᵀA) ), σᵢ ≥ 0 always.
Three consequences worth memorising:
- Always non-negative. Even if
Ahas negative or complex eigenvalues, everyσᵢis a real number≥ 0, because it is a square root of a non-negative eigenvalue ofAᵀA. - Symmetric PSD case. If
Ais symmetric positive-semidefinite, its singular values equal its eigenvalues (σᵢ = λᵢ). This is the only general case where they coincide. - Rank. The number of nonzero singular values equals rank(A) —
Σhas exactlyrank(A)nonzero entries on its diagonal.
For a glimpse of the geometry behind the formula, look at PCA. Reshape the cloud of points below and watch the principal arrows orient themselves. The arrows are the singular vectors of the (centred) data matrix and their lengths track the singular values — the directions of largest spread are the top singular value directions.
How GATE asks this
Almost always a NAT on a rank-1 or small matrix: “find the singular value(s)” or
“the sum of singular values.” The reliable recipe is σ = √(eigenvalues of AᵀA). GATE DA
2024 asked for the singular values of a rank-1 outer product uuᵀ (worked below), and
2025 continued the small-matrix theme. A common MCQ tests the concept: whether
singular values can be negative (no) and whether they equal eigenvalues in general (no —
only for symmetric PSD matrices).
Worked example — real GATE DA 2024
Let
u = (1, 2, 3, 4, 5)ᵀandM = uuᵀ(a 5-by-5 matrix). Find the sum of the singular values ofM.
M = uuᵀ is an outer product, so every column is a multiple of u — the column space
is a single line and rank(M) = 1. A rank-1 matrix has exactly one nonzero singular
value, and the rest are zero. To find it, look at MᵀM:
MᵀM = (uuᵀ)ᵀ(uuᵀ) = u (uᵀu) uᵀ = ‖u‖² · uuᵀ
because uᵀu = ‖u‖² is a scalar. So MᵀM = ‖u‖² · M, and applying M = uuᵀ to u
gives Mu = u(uᵀu) = ‖u‖² u — meaning u is an eigenvector of M with eigenvalue
‖u‖². The one nonzero eigenvalue of MᵀM is therefore (‖u‖²)², and the single nonzero
singular value is its square root:
σ = √( (‖u‖²)² ) = ‖u‖² = 1² + 2² + 3² + 4² + 5²
= 1 + 4 + 9 + 16 + 25 = 55.
All other singular values are 0, so the sum of the singular values is 55.
The shortcut to remember: for a rank-1 matrix uuᵀ, the lone nonzero singular value is
exactly ‖u‖² (the squared length of u).
Quick check
Quick check
Practice this in an interview
All questionsPCA finds the orthogonal directions of maximum variance in the data and projects onto a lower-dimensional subspace, reducing features while retaining most information. It is most useful before distance-based models or when training is bottlenecked by dimensionality. Its main limits are loss of interpretability, sensitivity to scale, and an assumption of linear structure.
OLS minimizes the sum of squared residuals. Setting the gradient of the loss to zero yields the normal equations, whose unique solution is the projection of y onto the column space of X. The closed-form is the hat matrix formula β = (XᵀX)⁻¹Xᵀy.
A Vector Autoregression (VAR) model extends ARIMA to multiple time series simultaneously: each variable is regressed on its own past values and the past values of all other variables in the system. Use VAR when the series have mutual predictive relationships (Granger-causality) and you want to model those interactions; ARIMA is sufficient when one series can be forecast in isolation.
An SVM finds the hyperplane that maximises the margin between the two nearest points of each class (the support vectors). When data is not linearly separable, the kernel trick implicitly maps inputs to a high-dimensional feature space — computing inner products there without ever materialising the transformation — enabling non-linear decision boundaries at the cost of linear-space computation.