datarekha

Quadratic Forms & Definiteness

A quadratic form x transpose A x measures a matrix's curvature. Its sign is set by the eigenvalues, and its extreme values on the unit sphere are the largest and smallest eigenvalues — the heart of PCA.

8 min read Advanced GATE DA Lesson 29 of 122

What you'll learn

  • A quadratic form is x transpose A x with A symmetric
  • Positive definite means all eigenvalues greater than 0; positive semidefinite means all eigenvalues at least 0
  • The max of x transpose A x over the unit sphere is the largest eigenvalue, and the min is the smallest eigenvalue
  • A Gram matrix X transpose X is always positive semidefinite

Before you start

A symmetric matrix has a hidden landscape inside it. Plug a direction x into the expression xᵀAx and out pops a single number — the “height” in that direction. Sweep x around and a surface emerges: maybe a bowl, maybe an inverted bowl, maybe a saddle. That expression is a quadratic form, and its shape is set entirely by the eigenvalues of A. Reading the curvature off the eigenvalues is what definiteness means, and it’s the engine behind PCA and behind every “is this minimum really a minimum?” question in optimisation.

The quadratic form and its curvature

For a symmetric matrix A, the quadratic form is the scalar xᵀAx. In two dimensions with A = [[a, b], [b, c]],

xᵀAx = a·x₁² + 2b·x₁x₂ + c·x₂²

a pure degree-2 expression. The geometry depends on the signs of the eigenvalues:

positive definite (a bowl)all λ > 0: curves up in every directionindefinite (a saddle)mixed signs: up one way, down another
Eigenvalue signs set the shape: all positive gives a bowl, mixed signs give a saddle.

Definiteness classifies the form by eigenvalue sign (always assume A symmetric):

  • Positive definite: xᵀAx > 0 for all x ≠ 0all eigenvalues > 0.
  • Positive semidefinite (PSD): xᵀAx ≥ 0 for all xall eigenvalues ≥ 0.
  • Negative definite / semidefinite: flip the inequalities (all < 0 / all ≤ 0).
  • Indefinite: eigenvalues of both signs — a saddle.

A Gram matrix XᵀX is always PSD, because xᵀ(XᵀX)x = (Xx)ᵀ(Xx) = ‖Xx‖² ≥ 0 for every x. Covariance matrices are Gram-like, which is why they are PSD.

The optimisation fact behind PCA

Here is the result GATE tests most. Constrain x to the unit sphere (‖x‖ = 1) and ask how big xᵀAx can get. Diagonalise A = QΛQᵀ and write x in the orthonormal eigenbasis with coordinates y = Qᵀx (so ‖y‖ = ‖x‖ = 1). Then

xᵀAx = yᵀΛy = λ₁y₁² + λ₂y₂² + … + λₙyₙ²,   with  y₁² + … + yₙ² = 1

This is a weighted average of the eigenvalues, with weights yᵢ² that sum to 1. A weighted average is maximised by dumping all the weight on the largest eigenvalue and minimised by dumping it on the smallest:

λₘ ≤ xᵀAx ≤ λ₁ on ∥x∥ = 1min = smallest λmax = largest λeach extreme is reached at the corresponding eigenvector
The constrained range of the quadratic form is exactly the eigenvalue interval.

So max₍‖x‖=1₎ xᵀAx = λ_max (achieved at the top eigenvector) and the minimum is λ_min. PCA is precisely this: the first principal direction maximises the variance xᵀΣx on the unit sphere, so it is the top eigenvector of the covariance Σ.

PCA is the cleanest place to see this max-on-the-unit-sphere result in action. Reshape the cloud below and watch how the top arrow — the direction of maximum variance — always lands along the eigenvector of the covariance matrix with the largest eigenvalue. The variance along that arrow is the maximum of xᵀΣx over the unit circle.

How GATE asks this

Usually a NAT or MCQ: “Maximise xᵀAx subject to ‖x‖ = 1” — the answer is the largest eigenvalue of A, full stop. Sometimes the matrix is given numerically and you compute its eigenvalues first; sometimes the eigenvalues are handed to you. The companion question asks for the minimum, which is the smallest eigenvalue. GATE DA 2025 and 2026 both ran this pattern, sometimes phrasing it as the variance maximised by PCA.

Worked example — the 2026 maximisation question

Let A be a symmetric matrix with eigenvalues 5 and 2. Find the maximum and minimum of xᵀAx subject to ‖x‖ = 1.

Set up the eigenbasis. Write x in A’s orthonormal eigenvectors, with coordinates y₁, y₂ and y₁² + y₂² = 1. Then

xᵀAx = 5·y₁² + 2·y₂²,   with  y₁² + y₂² = 1

Maximise. Substitute y₂² = 1 − y₁²:

xᵀAx = 5y₁² + 2(1 − y₁²) = 2 + 3y₁²

This grows with y₁², which is largest (= 1) when x is the eigenvector for λ = 5. So the maximum is 5, achieved at the top eigenvector.

Minimise. The same expression 2 + 3y₁² is smallest when y₁² = 0, i.e. x is the eigenvector for λ = 2. So the minimum is 2.

The constrained range of xᵀAx is exactly [2, 5] — the smallest to the largest eigenvalue. No calculus and no matrix entries needed; only the eigenvalues matter.

Quick check

Quick check

0/6
Q1A symmetric matrix A has eigenvalues 7 and 3. What is the maximum of xᵀAx subject to ‖x‖ = 1?numerical answer — type a number
Q2Same matrix (eigenvalues 7 and 3). What is the minimum of xᵀAx subject to ‖x‖ = 1?numerical answer — type a number
Q3Which conditions guarantee a symmetric matrix A is positive definite? (select all that apply)select all that apply
Q4Which statements about a Gram matrix G = XᵀX are true? (select all that apply)select all that apply
Q5A symmetric matrix has eigenvalues 4, 0, and −2. How is its quadratic form classified?
Q6For A = [[3, 0], [0, 6]], the maximum of xᵀAx on ‖x‖ = 1 is 6. At which unit vector is it achieved?

Practice this in an interview

All questions
What is PCA, when should you use it, and what are its key limitations?

PCA finds the orthogonal directions of maximum variance in the data and projects onto a lower-dimensional subspace, reducing features while retaining most information. It is most useful before distance-based models or when training is bottlenecked by dimensionality. Its main limits are loss of interpretability, sensitivity to scale, and an assumption of linear structure.

How does Ordinary Least Squares derive the coefficient vector, and what is the closed-form solution?

OLS minimizes the sum of squared residuals. Setting the gradient of the loss to zero yields the normal equations, whose unique solution is the projection of y onto the column space of X. The closed-form is the hat matrix formula β = (XᵀX)⁻¹Xᵀy.

What do skewness and kurtosis measure, and what are their practical implications?

Skewness measures the asymmetry of a distribution's tails — positive skew means a longer right tail, negative skew a longer left tail. Kurtosis measures the heaviness of the tails relative to a normal distribution; excess kurtosis above zero indicates more probability mass in the tails and peak than a Gaussian, which matters for risk and outlier frequency.

How do you read ACF and PACF plots, and what do they tell you about AR and MA orders?

The ACF measures correlation between a series and its own lags including indirect effects; the PACF strips out those indirect effects to show direct correlation at each lag. A cut-off in the PACF after lag p signals an AR(p) process; a cut-off in the ACF after lag q signals an MA(q) process.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content