datarekha

Singular Value Decomposition

Every matrix factors as A = UΣVᵀ with orthogonal U, V and non-negative singular values on Σ; the singular values are the square roots of the eigenvalues of AᵀA.

9 min read Advanced GATE DA Lesson 32 of 122

What you'll learn

  • SVD: A = UΣVᵀ with U, V orthogonal and Σ diagonal of non-negative singular values
  • Singular values σᵢ = √(eigenvalues of AᵀA), always ≥ 0
  • For a symmetric positive-semidefinite matrix the singular values equal the eigenvalues
  • Number of nonzero singular values = rank(A); a rank-1 uuᵀ has one nonzero singular value = ‖u‖²

Before you start

Eigenvalues are a bit picky. They only describe square matrices, and even then they can come out negative or complex. The singular value decomposition doesn’t mind any of that. It works for any matrix — square or not — and produces a clean set of non-negative numbers, the singular values, that measure how much the matrix stretches space along its most important directions.

The factorisation reads A = UΣVᵀ. Geometrically it’s three simple moves: an orthogonal V picks the input directions, the diagonal Σ stretches by a singular value along each, and an orthogonal U rotates the result into the output space. Rotate, stretch, rotate — that’s every linear map.

A = UΣVᵀ

A=Uorthogonal×Σdiag(σ₁,σ₂,…)×Vᵀorthogonalσᵢ ≥ 0always
A = UΣVᵀ: orthogonal rotation V, non-negative stretch Σ, orthogonal rotation U. The singular values σᵢ on Σ are never negative.

The central question on GATE is: where do the singular values come from? Form the symmetric matrix AᵀA. It is always symmetric positive-semidefinite, so its eigenvalues are real and ≥ 0. The singular values are their square roots:

σᵢ = √( λᵢ(AᵀA) ),     σᵢ ≥ 0 always.

Three consequences worth memorising:

  • Always non-negative. Even if A has negative or complex eigenvalues, every σᵢ is a real number ≥ 0, because it is a square root of a non-negative eigenvalue of AᵀA.
  • Symmetric PSD case. If A is symmetric positive-semidefinite, its singular values equal its eigenvalues (σᵢ = λᵢ). This is the only general case where they coincide.
  • Rank. The number of nonzero singular values equals rank(A)Σ has exactly rank(A) nonzero entries on its diagonal.

For a glimpse of the geometry behind the formula, look at PCA. Reshape the cloud of points below and watch the principal arrows orient themselves. The arrows are the singular vectors of the (centred) data matrix and their lengths track the singular values — the directions of largest spread are the top singular value directions.

How GATE asks this

Almost always a NAT on a rank-1 or small matrix: “find the singular value(s)” or “the sum of singular values.” The reliable recipe is σ = √(eigenvalues of AᵀA). GATE DA 2024 asked for the singular values of a rank-1 outer product uuᵀ (worked below), and 2025 continued the small-matrix theme. A common MCQ tests the concept: whether singular values can be negative (no) and whether they equal eigenvalues in general (no — only for symmetric PSD matrices).

Worked example — real GATE DA 2024

Let u = (1, 2, 3, 4, 5)ᵀ and M = uuᵀ (a 5-by-5 matrix). Find the sum of the singular values of M.

M = uuᵀ is an outer product, so every column is a multiple of u — the column space is a single line and rank(M) = 1. A rank-1 matrix has exactly one nonzero singular value, and the rest are zero. To find it, look at MᵀM:

MᵀM = (uuᵀ)ᵀ(uuᵀ) = u (uᵀu) uᵀ = ‖u‖² · uuᵀ

because uᵀu = ‖u‖² is a scalar. So MᵀM = ‖u‖² · M, and applying M = uuᵀ to u gives Mu = u(uᵀu) = ‖u‖² u — meaning u is an eigenvector of M with eigenvalue ‖u‖². The one nonzero eigenvalue of MᵀM is therefore (‖u‖²)², and the single nonzero singular value is its square root:

σ = √( (‖u‖²)² ) = ‖u‖² = 1² + 2² + 3² + 4² + 5²
                       = 1 + 4 + 9 + 16 + 25 = 55.

All other singular values are 0, so the sum of the singular values is 55.

The shortcut to remember: for a rank-1 matrix uuᵀ, the lone nonzero singular value is exactly ‖u‖² (the squared length of u).

Quick check

Quick check

0/6
Q1Let u = (1, 2, 2)ᵀ and M = uuᵀ. The sum of the singular values of M equals ‖u‖². Enter that value.numerical answer — type a number
Q2A = [[3, 0], [0, 4]] is diagonal with positive entries (symmetric PSD). What is its largest singular value?numerical answer — type a number
Q3A = [[0, 2], [0, 0]] has both eigenvalues equal to 0. What is its largest singular value?numerical answer — type a number
Q4Which statements about the SVD A = UΣVᵀ are ALWAYS true? (select all that apply)select all that apply
Q5A 4×6 matrix A has rank 3. How many nonzero singular values does it have?numerical answer — type a number
Q6In A = UΣVᵀ, what are U and V?

Practice this in an interview

All questions
What is PCA, when should you use it, and what are its key limitations?

PCA finds the orthogonal directions of maximum variance in the data and projects onto a lower-dimensional subspace, reducing features while retaining most information. It is most useful before distance-based models or when training is bottlenecked by dimensionality. Its main limits are loss of interpretability, sensitivity to scale, and an assumption of linear structure.

How does Ordinary Least Squares derive the coefficient vector, and what is the closed-form solution?

OLS minimizes the sum of squared residuals. Setting the gradient of the loss to zero yields the normal equations, whose unique solution is the projection of y onto the column space of X. The closed-form is the hat matrix formula β = (XᵀX)⁻¹Xᵀy.

What is a VAR model, and when would you use it instead of a univariate ARIMA?

A Vector Autoregression (VAR) model extends ARIMA to multiple time series simultaneously: each variable is regressed on its own past values and the past values of all other variables in the system. Use VAR when the series have mutual predictive relationships (Granger-causality) and you want to model those interactions; ARIMA is sufficient when one series can be forecast in isolation.

How does an SVM work, and what is the kernel trick?

An SVM finds the hyperplane that maximises the margin between the two nearest points of each class (the support vectors). When data is not linearly separable, the kernel trick implicitly maps inputs to a high-dimensional feature space — computing inner products there without ever materialising the transformation — enabling non-linear decision boundaries at the cost of linear-space computation.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content