Systems of Equations & Gaussian Elimination
Solve a linear system Ax = b by reducing the augmented matrix to row-echelon form, then classify it as having one solution, infinitely many, or none.
What you'll learn
- A linear system written as Ax = b, and its augmented matrix [A | b]
- Gaussian elimination to row-echelon form, then back-substitution
- The three outcomes: a unique solution, infinitely many, or none (inconsistent)
- Reading pivots and free variables off the reduced rows
Before you start
Two lines on a page. Either they cross at exactly one point, run parallel and
never touch, or sit on top of each other forever. That’s the whole story of a
linear system in two unknowns — and the higher-dimensional version (planes
instead of lines) is the same story with more axes. Each equation pins down a
shape; a solution is a point sitting on every shape at once. Everything that
follows is a disciplined way to find that point — or to prove there isn’t one.
Solving Ax = b is also the engine room of data work: fitting a linear regression
comes down to solving exactly such a system for the best-fit coefficients.
Packed into matrices, the system becomes Ax = b: the coefficient matrix A
holds the numbers in front of the unknowns, x is the column of unknowns, and b is
the right-hand side. Stick b onto A as one extra column and you get the
augmented matrix [A | b] — the object we actually push through elimination.
Gaussian elimination
The method is mechanical: use row operations (swap two rows, scale a row, add a multiple of one row to another) to drag the augmented matrix into row-echelon form — a staircase where each row’s leading nonzero entry (its pivot) sits to the right of the pivot above it, and any all-zero rows fall to the bottom. Once it is triangular, back-substitution reads off the unknowns from the bottom row upward.
Why bother triangulating? Because a triangular system is cheap to solve — the last
equation gives one variable directly, and each row above needs only one substitution.
Reducing a general n by n system to that form is the costly part: it takes on the
order of O(n^3) arithmetic operations. An already-triangular system skips
elimination entirely and is far cheaper (a 2025 paper made exactly this point by
counting operations). So the work is in the elimination, not the solving.
A pivot column pins down its variable. A column with no pivot corresponds to a free variable — a dial you can turn freely, which (when the system is consistent) produces infinitely many solutions.
To feel why the geometry matters, drag the entries of the matrix below. The
parallelogram it draws is the image of the unit square under A. When that
parallelogram is full-area, the system Ax = b has a unique solution for every
b; when the matrix collapses the square onto a line (determinant zero), no
unique solution exists — the system is either inconsistent or has a whole line
of answers.
- Unique solution — every variable is a pivot; the staircase pins down one point.
- No solution (inconsistent) — a row collapses to
0 = nonzero, an impossibility. - Infinitely many — the system is consistent but has at least one free variable.
How GATE asks this
Expect a small system (2 by 2 or 3 by 3) as either an MCQ (“which of these is the
solution?”) or a NAT (“solve for x, enter the value”). The reliable route is
always the same: write the augmented matrix, run elimination to echelon form, and
either back-substitute for the numbers or read the outcome off the final rows. A
common MCQ variant gives you a system and asks you to classify it — does it have a
unique solution, infinitely many, or none — which you settle by spotting pivots, a
zero row, or a contradictory row.
Worked example
Unique solution. Solve 2x + y = 5, x - y = 1.
augmented: [ 2 1 | 5 ]
[ 1 -1 | 1 ]
R2 -> R2 - (1/2)R1 (eliminate x below the first pivot):
[ 2 1 | 5 ]
[ 0 -3/2 | -3/2 ]
back-substitute: -3/2 · y = -3/2 -> y = 1
2x + 1 = 5 -> x = 2
So (x, y) = (2, 1) — one pivot per variable, one point. Both lines, one crossing.
No solution (inconsistent). Try x + y = 2, 2x + 2y = 5. The second equation is
twice the first on the left but not on the right:
[ 1 1 | 2 ] R2 -> R2 - 2R1 [ 1 1 | 2 ]
[ 2 2 | 5 ] ----------------> [ 0 0 | 1 ]
The bottom row says 0 = 1, which is false. The system is inconsistent — no
(x, y) works. The two lines are parallel.
Infinitely many (a free variable). Try x + y = 2, 2x + 2y = 4:
[ 1 1 | 2 ] R2 -> R2 - 2R1 [ 1 1 | 2 ]
[ 2 2 | 4 ] ----------------> [ 0 0 | 0 ]
The bottom row is 0 = 0 — always true, so it adds no constraint. Column 2 has no
pivot, so y is free: set y = t and x = 2 - t for any t. Infinitely many
solutions; the equations describe the same line.
Quick check
Quick check
Practice this in an interview
All questionsOLS minimizes the sum of squared residuals. Setting the gradient of the loss to zero yields the normal equations, whose unique solution is the projection of y onto the column space of X. The closed-form is the hat matrix formula β = (XᵀX)⁻¹Xᵀy.
The normal equation gives an exact closed-form solution in O(p³) time but becomes impractical when the number of features p is large (typically above ~10,000) because matrix inversion is cubic. Gradient descent scales as O(np) per iteration, making it the only viable option for large feature spaces or online learning.
OLS linear regression rests on five assumptions: linearity, independence of errors, homoscedasticity, normality of residuals, and no perfect multicollinearity. Violating any one of them degrades coefficient estimates, standard errors, or the validity of hypothesis tests.
Linear regression predicts unbounded real values, so it can output probabilities below 0 or above 1, and its loss function penalizes confident correct predictions. Logistic regression fixes this by applying the sigmoid to map any real score to (0,1) and optimizing log-loss, which is a proper scoring rule aligned with probability calibration.