How does Ordinary Least Squares derive the coefficient vector, and what is the closed-form solution?

OLS minimizes the sum of squared residuals. Setting the gradient of the loss to zero yields the normal equations, whose unique solution is the projection of y onto the column space of X. The closed-form is the hat matrix formula β = (XᵀX)⁻¹Xᵀy.

When should you use gradient descent over the normal equation to fit a linear regression?

The normal equation gives an exact closed-form solution in O(p³) time but becomes impractical when the number of features p is large (typically above ~10,000) because matrix inversion is cubic. Gradient descent scales as O(np) per iteration, making it the only viable option for large feature spaces or online learning.

What are the core assumptions of linear regression, and what breaks when each is violated?

OLS linear regression rests on five assumptions: linearity, independence of errors, homoscedasticity, normality of residuals, and no perfect multicollinearity. Violating any one of them degrades coefficient estimates, standard errors, or the validity of hypothesis tests.

Why is linear regression unsuitable for binary classification, and what specific problems does logistic regression fix?

Linear regression predicts unbounded real values, so it can output probabilities below 0 or above 1, and its loss function penalizes confident correct predictions. Logistic regression fixes this by applying the sigmoid to map any real score to (0,1) and optimizing log-loss, which is a proper scoring rule aligned with probability calibration.

Systems of Equations & Gaussian Elimination — GATE DA

Imagine a small tea stall. One morning you buy two samosas and one tea, and you pay forty rupees. The next morning you buy one samosa and one tea from the same stall, and you pay twenty-five. Nobody told you the price of a single samosa or a single tea. Yet you feel you could work it out. Let us see how.

You have two clues, and two unknowns hiding inside them. Each clue, on its own, allows many prices — a samosa could be ten and a tea thirty, or a samosa twenty and a tea twenty, and the first clue would still hold. The honest answer is the one pair of prices that obeys both clues at once. That is the whole idea of a system of equations: several conditions, and the single choice that satisfies all of them together.

When the obvious way runs out

With two unknowns you can juggle the two clues in your head. Subtract the second from the first and the tea cancels, leaving the samosa. Easy enough. But suppose the stall sold five things and you had five receipts. Now the juggling slips — which clue did you already use, and which one is still waiting? The head-method does not scale.

So here is the turn. How do we solve such a puzzle in a way that never loses track, and works the same whether there are two unknowns or ten? We need a routine, not a clever trick.

A tidy staircase

Think of tidying the clues into a staircase. We rearrange them so the first clue still carries every unknown, the next clue carries one fewer, and the last clue carries just one. A staircase stepping down to a single stair. That bottom stair holds one unknown alone, so we read it off at once; then we climb back up, and each higher stair needs only what the stairs below have already told us.

The ladder is worth climbing and then pushing away: unlike a real staircase, a step here sometimes flattens into nothing — a clue that turns into 0 = 0, or worse, 0 = 5. Hold that thought. Those flattened steps turn out to be the most interesting part of the whole lesson.

Giving the routine its name

Lay the numbers out in a grid. The coefficients sitting in front of the unknowns form the coefficient matrix A; the unknowns stack into a column x; the amounts you paid stack into a column b. The two clues, written together, become Ax = b. Glue b onto A as one extra column and you get the augmented matrix [A | b] — the single object we actually tidy.

The tidying itself, done by rows, is called Gaussian elimination, and the staircase it produces is called row-echelon form.

What the routine is, precisely

Gaussian elimination uses only three row operations — swap two rows, multiply a row by a nonzero number, or add a multiple of one row to another — to reduce [A | b] to row-echelon form: a shape where each row’s first nonzero entry, its pivot, sits to the right of the pivot in the row above, and any all-zero rows sink to the bottom. Once the staircase is built, back-substitution reads the unknowns off from the bottom row upward.

The work is all in the tidying. A staircase system is cheap to solve, but reducing a general n-by-n system to that staircase costs on the order of n³ operations — that elimination is the expensive step, and the back-substitution that follows is nearly free.

Watching it happen, step by step

Let us return to the stall and solve it in full, writing the matrix after every move. Let s be the price of a samosa and t the price of a tea.

two samosas and a tea cost 40:   2s + t = 40
one samosa and a tea cost 25:     s + t = 25

augmented matrix [A | b]:
  [ 2  1 | 40 ]
  [ 1  1 | 25 ]

We want a zero under the first pivot (the 2). Subtract half of row 1 from row 2, written R2 → R2 − ½·R1:

  [ 2   1  | 40 ]
  [ 0  1/2 |  5 ]     ← the s-term in row 2 is now gone

The staircase is built. The bottom stair holds one unknown alone, so read it, then climb back up:

bottom row:  (1/2)·t = 5      →  t = 10
top row:     2s + t = 40
             2s + 10 = 40     →  s = 15

So a samosa is ₹15 and a tea is ₹10. One price each — exactly one answer. Check it against both clues: 2(15) + 10 = 40 and 15 + 10 = 25. Both hold. The two clues, drawn as lines, cross at a single point.

When a step vanishes

Now change just the numbers and watch the bottom stair flatten. Suppose the two clues were x + y = 2 and 2x + 2y = 5. Tidy as before, R2 → R2 − 2·R1:

  [ 1  1 | 2 ]        [ 1  1 | 2 ]
  [ 2  2 | 5 ]   →    [ 0  0 | 1 ]

The bottom row now reads 0 = 1. That is simply false — no pair (x, y) can make nothing equal to one. The system is inconsistent: it has no solution. The two clues, as lines, are parallel and never meet.

Change one number more — make the second clue 2x + 2y = 4 — and the same tidying gives a different flat step:

  [ 1  1 | 2 ]        [ 1  1 | 2 ]
  [ 2  2 | 4 ]   →    [ 0  0 | 0 ]

This bottom row reads 0 = 0. Always true, and so it asks nothing of x and y. The second column never got a pivot, so y is a free variable — a dial you may turn to anything, say y = t, which fixes x = 2 − t. There are infinitely many solutions, because the two clues were secretly the same line.

Read the three endings off the final rows:

Two equations in two unknowns meet at one point, never (parallel), or everywhere (coincident).

One solution — every variable owns a pivot; the staircase pins down a single point.
No solution — a row collapses to 0 = nonzero, an impossibility.
Infinitely many — every row is consistent, but at least one variable is free.

The same split has a picture. Drag the entries of the matrix below and watch the unit square get carried to a parallelogram. While that parallelogram has real area, Ax = b has one answer for every b. The moment the matrix squashes the square flat onto a line, the single crossing point is lost — and the system tips into either the parallel case or the same-line case.

TryLinear maps · drag î and ĵ

A matrix is a function on space — its columns are where î and ĵ land

col 1 = îcol 2 = ĵ

Determinant (signed area)1.25areas scale by 1.25×

Drag the tip of î and ĵ — they are the matrix's two columns. Everything else follows linearly, so the whole grid warps with them. The shaded square's area is |det|; flip a column past the other and orientation reverses (det goes negative).

A question to carry forward

Suppose a 3-by-3 system tidies down to a staircase whose bottom row is all zeros, 0 = 0, while the two rows above are perfectly fine. The system is not broken — but is the answer a single point, or something larger? Hold the staircase in mind and ask: with one stair flattened, how many unknowns are left truly pinned down, and how many are free to roam?

In one breath

A system is several conditions at once; the solution is the choice obeying all of them.
Write it Ax = b, glue b on to form the augmented matrix [A | b].
Gaussian elimination (swap, scale, add-a-multiple) tidies it into a staircase (row-echelon form); back-substitution reads the unknowns from the bottom up.
A pivot pins a variable; a column with no pivot is a free variable.
Three endings, read off the last rows: every variable a pivot → one solution; a row 0 = nonzero → no solution; a row 0 = 0 with a free variable → infinitely many.

Practice

Quick check

0/7

Q1Recall: in the augmented matrix [A | b], what is the column b?

Q2Trace: solve 2x + y = 5, x − y = 1 by elimination and enter the value of x.numerical answer — type a number

Q3Trace: a 2-by-2 system reduces to [[2, 1, | 5], [0, 3, | 9]]. Enter y from back-substitution.numerical answer — type a number

Q4Apply: eliminating a 2-by-2 system in x and y leaves the bottom row [0, 0, | 4]. What does this mean?

Q5Apply: for x + 2y = 4 and 3x + 6y = 12, how many solutions are there?

Q6Apply: which statements about row-echelon form are correct? (select all that apply)select all that apply

Q7Create: write the augmented matrix for the stall's other day — three samosas and two teas cost 65, one samosa and one tea cost 25 — then say which operation clears the lower-left entry first.

Systems of Equations & Gaussian Elimination

What you'll learn

Before you start