datarekha

Systems of Equations & Gaussian Elimination

Solve a linear system Ax = b by reducing the augmented matrix to row-echelon form, then classify it as having one solution, infinitely many, or none.

8 min read Intermediate GATE DA Lesson 21 of 122

What you'll learn

  • A linear system written as Ax = b, and its augmented matrix [A | b]
  • Gaussian elimination to row-echelon form, then back-substitution
  • The three outcomes: a unique solution, infinitely many, or none (inconsistent)
  • Reading pivots and free variables off the reduced rows

Before you start

Two lines on a page. Either they cross at exactly one point, run parallel and never touch, or sit on top of each other forever. That’s the whole story of a linear system in two unknowns — and the higher-dimensional version (planes instead of lines) is the same story with more axes. Each equation pins down a shape; a solution is a point sitting on every shape at once. Everything that follows is a disciplined way to find that point — or to prove there isn’t one. Solving Ax = b is also the engine room of data work: fitting a linear regression comes down to solving exactly such a system for the best-fit coefficients.

Packed into matrices, the system becomes Ax = b: the coefficient matrix A holds the numbers in front of the unknowns, x is the column of unknowns, and b is the right-hand side. Stick b onto A as one extra column and you get the augmented matrix [A | b] — the object we actually push through elimination.

Gaussian elimination

The method is mechanical: use row operations (swap two rows, scale a row, add a multiple of one row to another) to drag the augmented matrix into row-echelon form — a staircase where each row’s leading nonzero entry (its pivot) sits to the right of the pivot above it, and any all-zero rows fall to the bottom. Once it is triangular, back-substitution reads off the unknowns from the bottom row upward.

Why bother triangulating? Because a triangular system is cheap to solve — the last equation gives one variable directly, and each row above needs only one substitution. Reducing a general n by n system to that form is the costly part: it takes on the order of O(n^3) arithmetic operations. An already-triangular system skips elimination entirely and is far cheaper (a 2025 paper made exactly this point by counting operations). So the work is in the elimination, not the solving.

A pivot column pins down its variable. A column with no pivot corresponds to a free variable — a dial you can turn freely, which (when the system is consistent) produces infinitely many solutions.

To feel why the geometry matters, drag the entries of the matrix below. The parallelogram it draws is the image of the unit square under A. When that parallelogram is full-area, the system Ax = b has a unique solution for every b; when the matrix collapses the square onto a line (determinant zero), no unique solution exists — the system is either inconsistent or has a whole line of answers.

Unique solutionlines cross onceNo solutionparallel, never meetInfinitely manysame line, fully overlap
Two equations in two unknowns meet at one point, never (parallel), or everywhere (coincident).
  • Unique solution — every variable is a pivot; the staircase pins down one point.
  • No solution (inconsistent) — a row collapses to 0 = nonzero, an impossibility.
  • Infinitely many — the system is consistent but has at least one free variable.

How GATE asks this

Expect a small system (2 by 2 or 3 by 3) as either an MCQ (“which of these is the solution?”) or a NAT (“solve for x, enter the value”). The reliable route is always the same: write the augmented matrix, run elimination to echelon form, and either back-substitute for the numbers or read the outcome off the final rows. A common MCQ variant gives you a system and asks you to classify it — does it have a unique solution, infinitely many, or none — which you settle by spotting pivots, a zero row, or a contradictory row.

Worked example

Unique solution. Solve 2x + y = 5, x - y = 1.

augmented:  [ 2   1 | 5 ]
            [ 1  -1 | 1 ]

R2 -> R2 - (1/2)R1   (eliminate x below the first pivot):
            [ 2    1   |  5   ]
            [ 0  -3/2  | -3/2 ]

back-substitute:  -3/2 · y = -3/2  ->  y = 1
                  2x + 1 = 5       ->  x = 2

So (x, y) = (2, 1) — one pivot per variable, one point. Both lines, one crossing.

No solution (inconsistent). Try x + y = 2, 2x + 2y = 5. The second equation is twice the first on the left but not on the right:

[ 1  1 | 2 ]   R2 -> R2 - 2R1   [ 1  1 | 2 ]
[ 2  2 | 5 ]  ----------------> [ 0  0 | 1 ]

The bottom row says 0 = 1, which is false. The system is inconsistent — no (x, y) works. The two lines are parallel.

Infinitely many (a free variable). Try x + y = 2, 2x + 2y = 4:

[ 1  1 | 2 ]   R2 -> R2 - 2R1   [ 1  1 | 2 ]
[ 2  2 | 4 ]  ----------------> [ 0  0 | 0 ]

The bottom row is 0 = 0 — always true, so it adds no constraint. Column 2 has no pivot, so y is free: set y = t and x = 2 - t for any t. Infinitely many solutions; the equations describe the same line.

Quick check

Quick check

0/6
Q1Solve the system 3x + 2y = 12, x - y = 1 by elimination. Enter the value of x.numerical answer — type a number
Q2Eliminating gives the augmented matrix [[1, 2, | 3], [0, 0, | 0]] for a 2-equation system in x and y. How many solutions does the system have?
Q3Eliminating a 2-by-2 system produces the row [0, 0, | 4] at the bottom. What does this tell you?
Q4Which statements about Gaussian elimination and row-echelon form are correct? (select all that apply)select all that apply
Q5For the system x + 2y = 4, 3x + 6y = 12: how many solutions does it have?
Q6A 2-by-2 system reduces to [[2, 1, | 5], [0, 3, | 9]]. Enter the value of y from back-substitution.numerical answer — type a number

Practice this in an interview

All questions
How does Ordinary Least Squares derive the coefficient vector, and what is the closed-form solution?

OLS minimizes the sum of squared residuals. Setting the gradient of the loss to zero yields the normal equations, whose unique solution is the projection of y onto the column space of X. The closed-form is the hat matrix formula β = (XᵀX)⁻¹Xᵀy.

When should you use gradient descent over the normal equation to fit a linear regression?

The normal equation gives an exact closed-form solution in O(p³) time but becomes impractical when the number of features p is large (typically above ~10,000) because matrix inversion is cubic. Gradient descent scales as O(np) per iteration, making it the only viable option for large feature spaces or online learning.

What are the core assumptions of linear regression, and what breaks when each is violated?

OLS linear regression rests on five assumptions: linearity, independence of errors, homoscedasticity, normality of residuals, and no perfect multicollinearity. Violating any one of them degrades coefficient estimates, standard errors, or the validity of hypothesis tests.

Why is linear regression unsuitable for binary classification, and what specific problems does logistic regression fix?

Linear regression predicts unbounded real values, so it can output probabilities below 0 or above 1, and its loss function penalizes confident correct predictions. Logistic regression fixes this by applying the sigmoid to map any real score to (0,1) and optimizing log-loss, which is a proper scoring rule aligned with probability calibration.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content