How does PCA work, and how do you choose the number of components?

PCA finds orthogonal directions (principal components) of maximum variance by computing the eigenvectors of the covariance matrix, then projects data onto the top components. Choose the number of components by the cumulative explained variance ratio (e.g. enough to retain 95%), a scree-plot elbow, or downstream task performance. Always standardize features first, since PCA is variance-driven.

What is the kernel trick in SVM, and why does it work?

The kernel trick lets an SVM find a nonlinear decision boundary by implicitly mapping data into a higher-dimensional space where it becomes linearly separable, without ever computing that mapping explicitly. It works because the SVM's dual formulation depends only on dot products between points, and a kernel function computes that dot product directly in the high-dimensional space. Common kernels are linear, polynomial, and RBF.

What is PCA, when should you use it, and what are its key limitations?

PCA finds the orthogonal directions of maximum variance in the data and projects onto a lower-dimensional subspace, reducing features while retaining most information. It is most useful before distance-based models or when training is bottlenecked by dimensionality. Its main limits are loss of interpretability, sensitivity to scale, and an assumption of linear structure.

How does Ordinary Least Squares derive the coefficient vector, and what is the closed-form solution?

OLS minimizes the sum of squared residuals. Setting the gradient of the loss to zero yields the normal equations, whose unique solution is the projection of y onto the column space of X. The closed-form is the hat matrix formula β = (XᵀX)⁻¹Xᵀy.

Singular Value Decomposition — GATE DA

Singular Value Decomposition

The decomposition that minds nothing — any matrix at all, square or rectangular, factors as A = UΣVᵀ: rotate, stretch by non-negative singular values, rotate. The singular values are the square roots of the eigenvalues of AᵀA, and their count is the rank.

9 min read Advanced GATE DA Lesson 32 of 122

What you'll learn

SVD: A = UΣVᵀ with U, V orthogonal and Σ diagonal of non-negative singular values

Singular values σᵢ = √(eigenvalues of AᵀA), always ≥ 0

For a symmetric positive-semidefinite matrix the singular values equal the eigenvalues

Number of nonzero singular values = rank(A); a rank-1 uuᵀ has one nonzero σ = ‖u‖²

The last lesson’s question asked for a decomposition that works on any matrix. Eigenvalues will not do — they only describe square matrices, and even then can come out negative or complex. The singular value decomposition minds none of that. It works for any matrix at all, square or rectangular, and produces a clean set of non-negative numbers, the singular values, measuring how much the matrix stretches space along its most important directions.

The factorisation reads A = UΣVᵀ, and geometrically it is three simple moves: an orthogonal V picks the input directions, the diagonal Σ stretches along each by a singular value, and an orthogonal U rotates the result into the output space. Rotate, stretch, rotate — and that is every linear map there is.

A = UΣVᵀ

A = UΣVᵀ: orthogonal rotation V, non-negative stretch Σ, orthogonal rotation U. The singular values σᵢ on Σ are never negative.

The exam’s central question is where the singular values come from. Form the matrix AᵀA — always symmetric and positive-semidefinite, so its eigenvalues are real and ≥ 0. The singular values are their square roots:

σᵢ = √( λᵢ(AᵀA) ),     σᵢ ≥ 0 always.

Three consequences are worth memorising:

Always non-negative: even if A has negative or complex eigenvalues, each σᵢ is a real ≥ 0, being a square root of a non-negative eigenvalue of AᵀA.
Symmetric PSD case: if A is symmetric positive-semidefinite, its singular values equal its eigenvalues (σᵢ = λᵢ) — the only general case where they coincide.
Rank: the number of non-zero singular values equals rank(A).

The geometry is the same one PCA exploits. Reshape the cloud below: the principal arrows are the singular vectors of the (centred) data matrix, and their lengths track the singular values — the directions of largest spread are the top singular-value directions.

TryPCA axes

Drag points — watch the principal axes re-fit live

Variance explained

PC191.7%

PC28.3%

mean(0.4, 0.3)

PC1 (solid) points along maximum variance. PC2 (dashed) is perpendicular. Toggling "project onto PC1" collapses the cloud to 1D — that's the 2D-to-1D reduction in action.

A worked example — real GATE DA 2024

Let u = (1, 2, 3, 4, 5)ᵀ and M = uuᵀ (a 5×5 matrix). Find the sum of the singular values of M.

M = uuᵀ is an outer product, so every column is a multiple of u — the column space is one line, rank(M) = 1. A rank-1 matrix has exactly one non-zero singular value. Applying M to u gives Mu = u(uᵀu) = ‖u‖²·u, so u is an eigenvector of M with eigenvalue ‖u‖²; the lone non-zero eigenvalue of MᵀM is (‖u‖²)², and its square root is

σ = √( (‖u‖²)² ) = ‖u‖² = 1² + 2² + 3² + 4² + 5² = 1 + 4 + 9 + 16 + 25 = 55

All other singular values are 0, so the sum of the singular values is 55. The shortcut: for a rank-1 uuᵀ, the one non-zero singular value is ‖u‖² (the squared length of u).

A question to carry forward

You now hold a whole web of always-true facts — the determinant test for singularity, the rank conditions on solutions, the eigenvalue and singular-value rules. GATE’s hardest Linear Algebra questions rarely test one in isolation; they stack several into a single “which of the following is always true?” and plant one plausible falsehood among them. Here is the thread onward: how do you reason through such a clustered statement quickly, knowing which facts are watertight and which are the classic traps?

In one breath

SVD: every matrix (any shape) factors A = UΣVᵀ — orthogonal V (rotate), diagonal Σ of non-negative singular values (stretch), orthogonal U (rotate).
Singular values σᵢ = √(eigenvalues of AᵀA), always real and ≥ 0 (even if A’s eigenvalues are negative/complex).
Count: number of non-zero σ = rank(A).
σ = λ only for symmetric PSD A; in general they differ ([[0,2],[0,0]]: σ = 2,0 but λ = 0,0).
Rank-1 uuᵀ: one non-zero singular value = ‖u‖² (e.g. u=(1,2,3,4,5) → 55).

Practice

Quick check

0/6

Q1Recall: in A = UΣVᵀ, what are U, Σ, and V?

Q2Trace: u = (1, 2, 2)ᵀ and M = uuᵀ. The sum of the singular values equals ‖u‖². Enter it.numerical answer — type a number

Q3Trace: A = [[0, 2], [0, 0]] has both eigenvalues 0. What is its largest singular value?numerical answer — type a number

Q4Apply: A = [[3, 0], [0, 4]] (symmetric PSD). What is its largest singular value?numerical answer — type a number

Q5Apply: which statements about A = UΣVᵀ are ALWAYS true? (select all that apply)select all that apply

Q6Create: a 4×6 matrix A has rank 3. State the number of non-zero singular values and explain why SVD applies at all.

Singular Value Decomposition

What you'll learn

Before you start

A = UΣVᵀ

Drag points — watch the principal axes re-fit live

A worked example — real GATE DA 2024

A question to carry forward

In one breath

Practice

Quick check

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further